{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xFCx6jZU3m11"
   },
   "source": [
    "<!-- Banner Image -->\n",
    "<img src=\"https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brev-xmas-3.png\" width=\"100%\">\n",
    "\n",
    "<!-- Links -->\n",
    "<center>\n",
    "  <a href=\"https://console.brev.dev\" style=\"color: #06b6d4;\">Console</a> •\n",
    "  <a href=\"https://brev.dev\" style=\"color: #06b6d4;\">Docs</a> •\n",
    "  <a href=\"/\" style=\"color: #06b6d4;\">Templates</a> •\n",
    "  <a href=\"https://discord.gg/NVDyv7TUgJ\" style=\"color: #06b6d4;\">Discord</a>\n",
    "</center>\n",
    "\n",
    "# Fine-tuning Microsoft's Phi-2 on your own data 🤙\n",
    "\n",
    "Welcome!\n",
    "\n",
    "In this notebook and tutorial, we will fine-tune [Microsoft's Phi-2](https://huggingface.co/microsoft/phi-2) relatively small 2.7B model - which has \"showcased a nearly state-of-the-art performance among models with less than 13 billion parameters\" - ***on your own data!***\n",
    "\n",
    "## Watch the accompanying video walk-through (but for Mistral 7B) [here](https://youtu.be/kmkcNVvEz-k?si=Ogt1wRFNqYI6zXfw&t=1)! \n",
    "If you'd like to see a notebook to fine-tune Phi-2 on a Hugging Face dataset instead, click [here](https://github.com/brevdev/notebooks/blob/main/phi2-finetune-own-data.ipynb).\n",
    "\n",
    "I did this for **just one dollar ($1)** on an 1x A10G 24GB from Brev.dev (instructions below).\n",
    "\n",
    "This tutorial will use QLoRA, a fine-tuning method that combines quantization and LoRA. For more information about what those are and how they work, see [this post](https://brev.dev/blog/how-qlora-works).\n",
    "\n",
    "Note that if you ever have trouble importing something from Huggingface, you may need to run `huggingface-cli login` in a shell. To open a shell in Jupyter Lab, click on 'Launcher' (or the '+' if it's not there) next to the notebook tab at the top of the screen. Under \"Other\", click \"Terminal\" and then run the command.\n",
    "\n",
    "### Help us make this tutorial better! Please provide feedback on the [Discord channel](https://discord.gg/RN2a436M73) or on [X](https://x.com/harperscarroll)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "G9TytWkb3m15"
   },
   "source": [
    "#### Before we begin: A note on OOM errors\n",
    "\n",
    "If you get an error like this: `OutOfMemoryError: CUDA out of memory`, tweak your parameters to make the model less computationally intensive. I will help guide you through that in this guide, and if you have any additional questions you can reach out on the [Discord channel](https://discord.gg/RN2a436M73) or on [X](https://x.com/harperscarroll).\n",
    "\n",
    "To re-try after you tweak your parameters, open a Terminal ('Launcher' or '+' in the nav bar above -> Other -> Terminal) and run the command `nvidia-smi`. Then find the process ID `PID` under `Processes` and run the command `kill [PID]`. You will need to re-start your notebook from the beginning. (There may be a better way to do this... if so please do let me know!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VC-9m2yv3m18"
   },
   "source": [
    "## Let's begin!\n",
    "### 0. Preparing data\n",
    "\n",
    "Before you check out a GPU, prepare your dataset for loading and training.\n",
    "\n",
    "To prepare your dataset for loading, all you need are two `.jsonl` files structured something like this:\n",
    "```\n",
    "{\"input\": \"What color is the sky?\", \"output\": \"The sky is blue.\"}\n",
    "{\"input\": \"Where is the best place to get cloud GPUs?\", \"output\": \"Brev.dev\"}\n",
    "```\n",
    "If you choose to model your data as input/output pairs, you'll want to use something like the second `formatting_func` below, which will will combine all your features into one input string.\n",
    "\n",
    "As you can see below, I have `notes.jsonl` for my `train_dataset` and `notes_validation.jsonl` for my `eval_dataset`.\n",
    "\n",
    "I used Exporter, a free local-only app, to export my Apple Notes to `.txt` files, and then I wrote a script to process each note into one `.jsonl` file. Note that for this script, ChatGPT can help out a LOT if you tell it how your data is currently formatted, how you'd like it to be formatted, and ask it to write a script in a certain language you know well (for any debugging) to do so. I also broke up my journal entries so the training sample vector length was smaller (see the discussion on `max_length` and the data visualization below). I broke it into pieces so that contexts were encapsulated entirely, since I did want the model to understand context about my life. My data were ultimately formatted as:\n",
    "\n",
    "```json\n",
    "{\"note\": \"journal-entry-for-model-to-predict\"}\n",
    "{\"note\": \"journal-entry-for-model-to-predict-1\"}\n",
    "{\"note\": \"journal-entry-for-model-to-predict-2\"}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "E2CkxsA43m15"
   },
   "source": [
    "### 1. Set Up GPU\n",
    "\n",
    "I used a GPU and dev environment from [brev.dev](https://brev.dev). The whole thing cost me $1 using a 1xA10G 24GB. Click the badge below to get your preconfigured instance:\n",
    "\n",
    "[![click here to deploy](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/environment/new?instance=A10G:g5.xlarge&diskStorage=256&name=phi2-finetune-own-data&file=https://github.com/brevdev/notebooks/raw/main/phi2-finetune-own-data.ipynb&python=3.10&cuda=12.0.1)\n",
    "\n",
    "A single A10G (as linked) with 24GB GPU Memory was enough for me. You may need more GPUs and/or Memory if your sequence max_length is larger than 512.\n",
    "\n",
    "Once you've checked out your machine and landed in your instance page, select the specs you'd like (I used **Python 3.10 and CUDA 12.0.1**; these should be preconfigured for you if you use the badge above) and click the \"Build\" button to build your verb container. Give this a few minutes.\n",
    "\n",
    "A few minutes after your model has started Running, click the 'Notebook' button on the top right of your screen once it illuminates (you may need to refresh the screen). You will be taken to a Jupyter Lab environment, where you can upload this Notebook.\n",
    "\n",
    "\n",
    "Note: You can connect your cloud credits (AWS or GCP) by clicking \"Org: \" on the top right, and in the panel that slides over, click \"Connect AWS\" or \"Connect GCP\" under \"Connect your cloud\" and follow the instructions linked to attach your credentials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "FuXIFTFapAMI",
    "outputId": "c8ced1ad-c7b3-44ba-807b-26d7d13906bc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# You only need to run this once per machine\n",
    "!pip install -q -U bitsandbytes\n",
    "!pip install -q -U git+https://github.com/huggingface/transformers.git\n",
    "!pip install -q -U git+https://github.com/huggingface/peft.git\n",
    "!pip install -q -U git+https://github.com/huggingface/accelerate.git\n",
    "!pip install -q -U datasets scipy ipywidgets matplotlib einops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "05H5MIfjyRgc"
   },
   "source": [
    "#### Accelerator\n",
    "\n",
    "Set up the Accelerator. I'm not sure if we really need this for a QLoRA given its [description](https://huggingface.co/docs/accelerate/v0.19.0/en/usage_guides/fsdp) (I have to read more about it) but it seems it can't hurt, and it's helpful to have the code for future reference. You can always comment out the accelerator if you want to try without."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "TEzYBadkyRgd"
   },
   "outputs": [],
   "source": [
    "from accelerate import FullyShardedDataParallelPlugin, Accelerator\n",
    "from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig\n",
    "\n",
    "fsdp_plugin = FullyShardedDataParallelPlugin(\n",
    "    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),\n",
    "    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),\n",
    ")\n",
    "\n",
    "accelerator = Accelerator(fsdp_plugin=fsdp_plugin)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-9KNTJZkyRgn"
   },
   "source": [
    "#### Weights & Biases\n",
    "\n",
    "Let's use Weights & Biases to track our training metrics. You'll need to apply an API key when prompted. Feel free to skip this if you'd like, and just comment out the `wandb` parameters in the `Trainer` definition below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "DDqUNyIoyRgo"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mharperc\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n"
     ]
    }
   ],
   "source": [
    "!pip install -q wandb -U\n",
    "\n",
    "import wandb, os\n",
    "wandb.login()\n",
    "\n",
    "wandb_project = \"journal-finetune\"\n",
    "if len(wandb_project) > 0:\n",
    "    os.environ[\"WANDB_PROJECT\"] = wandb_project"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Load Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "s6f4z8EYmcJ6"
   },
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "train_dataset = load_dataset('json', data_files='notes.jsonl', split='train')\n",
    "eval_dataset = load_dataset('json', data_files='notes_validation.jsonl', split='train')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uhw8JiOr3m18"
   },
   "source": [
    "#### Formatting prompts\n",
    "Then create a `formatting_func` to structure training examples as prompts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "f-fJR0MlQiTD"
   },
   "outputs": [],
   "source": [
    "def formatting_func(example):\n",
    "    text = f\"### The following is a note by Eevee the Dog: {example['note']}\"\n",
    "    return text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sflV0DL2P64_"
   },
   "source": [
    "Here's another common one:\n",
    "\n",
    "```python\n",
    "def formatting_func(example):\n",
    "    text = f\"### Question: {example['input']}\\n ### Answer: {example['output']}\"\n",
    "    return text\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "shz8Xdv-yRgf"
   },
   "source": [
    "### 3. Load Base Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MJ-5idQwzvg-"
   },
   "source": [
    "Let's now load Phi-2 using 8-bit quantization!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "45524c98039a46d5b7745ad7cb638d2f"
     ]
    },
    "id": "E0Nl5mWL0k2T",
    "outputId": "47b6b01d-e9f2-4b70-919c-17ae64993843"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bda29a65a0d5475999140599a50e5cbe",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import torch\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "\n",
    "base_model_id = \"microsoft/phi-2\"\n",
    "model = AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True, torch_dtype=torch.float16, load_in_8bit=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UjNdXolqyRgf"
   },
   "source": [
    "### 4. Tokenization\n",
    "\n",
    "Set up the tokenizer. Add padding on the left as it [makes training use less memory](https://ai.stackexchange.com/questions/41485/while-fine-tuning-a-decoder-only-llm-like-llama-on-chat-dataset-what-kind-of-pa).\n",
    "\n",
    "\n",
    "For `model_max_length`, it's helpful to get a distribution of your data lengths. Let's first tokenize without the truncation/padding, so we can get a length distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "haSUDD9HyRgf",
    "outputId": "22ee95db-2974-4ab0-e0c7-444d04d3e838"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    }
   ],
   "source": [
    "tokenizer = AutoTokenizer.from_pretrained(\n",
    "    base_model_id,\n",
    "    padding_side=\"left\",\n",
    "    add_eos_token=True,\n",
    "    add_bos_token=True,\n",
    "    use_fast=False, # needed for now, should be fixed soon\n",
    ")\n",
    "tokenizer.pad_token = tokenizer.eos_token\n",
    "\n",
    "def generate_and_tokenize_prompt(prompt):\n",
    "    return tokenizer(formatting_func(prompt))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WHnKLcq4yRgg"
   },
   "source": [
    "Reformat the prompt and tokenize each sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "S3iLAwLh3m19"
   },
   "outputs": [],
   "source": [
    "tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt)\n",
    "tokenized_val_dataset = eval_dataset.map(generate_and_tokenize_prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "O6ewk27p3m19"
   },
   "source": [
    "Let's get a distribution of our dataset lengths, so we can determine the appropriate `max_length` for our input tensors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "BA8M9yfC3m19",
    "outputId": "99c6d302-9bb6-47b1-cae9-a1cd870b4770"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "204\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAIjCAYAAADWYVDIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABCuElEQVR4nO3deVzU1f7H8fcIsggCruCCQkrulrkUSeWCkZplWi7XSr2aLZpr5bVVS9OsTG1R2zQrsyy17KamuJU/NTWXtMR9Z6lMEFNQOL8/ejD3jKACIoP4ej4e87jN+Z75fj/fmaPO+57v94zDGGMEAAAAAJAklXB3AQAAAABQlBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAUayNHjpTD4SiUY7Vo0UItWrRwPl+xYoUcDoe+/PLLQjl+r169FBYWVijHyq/U1FT17dtXISEhcjgcGjx4sLtLKnCF/blfzKJFi3T99dfLx8dHDodDx48fz7HfjBkz5HA4tH///kKt73LIy7mEhYWpV69el70mAFcWQhKAK0bWF5+sh4+PjypXrqyYmBhNnjxZJ06cKJDjHD16VCNHjtTmzZsLZH8FqSjXlhsvv/yyZsyYoUcffVQff/yxHnjggfP2DQsL05133lmI1eXNrFmzNHHiRHeXcUF//vmnunTpIl9fX7399tv6+OOP5efn5+6ycuXXX3/VyJEji0VoA3Dl8XR3AQCQVy+++KLCw8N15swZJSQkaMWKFRo8eLAmTJigb775Rg0bNnT2ffbZZ/Wf//wnT/s/evSoRo0apbCwMF1//fW5ft3333+fp+Pkx4Vqe++995SZmXnZa7gUy5Yt00033aQXXnjB3aVcslmzZmnbtm1FejZs/fr1OnHihF566SVFR0dfsO8DDzygbt26ydvbu5Cqu7Bff/1Vo0aNUosWLfI8Q1rUzgXAlYeQBOCK07ZtWzVp0sT5fMSIEVq2bJnuvPNO3XXXXfrtt9/k6+srSfL09JSn5+X9q+7vv/9WqVKl5OXldVmPczElS5Z06/FzIykpSXXr1nV3GVeNpKQkSVJQUNBF+3p4eMjDw+MyV1Q4itO5AHAPLrcDUCy0atVKzz33nA4cOKBPPvnE2Z7TPUlLlixRVFSUgoKC5O/vr1q1aunpp5+W9M/9JE2bNpUk9e7d23lp34wZMyT9c99R/fr1tXHjRt16660qVaqU87Xn3pOUJSMjQ08//bRCQkLk5+enu+66S4cOHXLpc777Iux9Xqy2nO5JOnnypIYNG6bQ0FB5e3urVq1aeu2112SMcenncDg0YMAAzZ8/X/Xr15e3t7fq1aunRYsW5fyGnyMpKUl9+vRRcHCwfHx8dN111+mjjz5ybs+6T2ffvn3673//66y9IC6l+uSTT9S4cWP5+vqqbNmy6tatW7b3N+tz+/XXX9WyZUuVKlVKVapU0fjx47Pt78CBA7rrrrvk5+enihUrasiQIVq8eLEcDodWrFjh3N9///tfHThwwHku5773mZmZGjNmjKpWrSofHx+1bt1au3fvdumza9cude7cWSEhIfLx8VHVqlXVrVs3JScnX/S858yZ4zzv8uXL6/7779eRI0dczrlnz56SpKZNm8rhcFzw3puc7uPJuuTxxx9/VLNmzeTj46NrrrlGM2fOzPG1q1at0sMPP6xy5copICBADz74oP766y+Xvg6HQyNHjsx2fPvPwIwZM3TfffdJklq2bOl8j7Pe/4vJ6VyMMRo9erSqVq2qUqVKqWXLltq+fXu21545c0ajRo1SRESEfHx8VK5cOUVFRWnJkiW5OjaA4oGZJADFxgMPPKCnn35a33//vR566KEc+2zfvl133nmnGjZsqBdffFHe3t7avXu3Vq9eLUmqU6eOXnzxRT3//PPq16+fbrnlFknSzTff7NzHn3/+qbZt26pbt266//77FRwcfMG6xowZI4fDoeHDhyspKUkTJ05UdHS0Nm/e7Jzxyo3c1GYzxuiuu+7S8uXL1adPH11//fVavHixnnzySR05ckRvvPGGS/8ff/xRc+fO1WOPPabSpUtr8uTJ6ty5sw4ePKhy5cqdt65Tp06pRYsW2r17twYMGKDw8HDNmTNHvXr10vHjxzVo0CDVqVNHH3/8sYYMGaKqVatq2LBhkqQKFSrk+vxzMmbMGD333HPq0qWL+vbtq99//11vvvmmbr31Vm3atMllBuWvv/7SHXfcoU6dOqlLly768ssvNXz4cDVo0EBt27aV9E+obNWqleLj4zVo0CCFhIRo1qxZWr58uctxn3nmGSUnJ+vw4cPO99Hf39+lz7hx41SiRAk98cQTSk5O1vjx49WjRw+tW7dOkpSenq6YmBilpaXp8ccfV0hIiI4cOaJvv/1Wx48fV2Bg4HnPe8aMGerdu7eaNm2qsWPHKjExUZMmTdLq1aud5/3MM8+oVq1aevfdd52XqNaoUSPP7/Hu3bt17733qk+fPurZs6c+/PBD9erVS40bN1a9evVc+g4YMEBBQUEaOXKk4uLiNGXKFB04cMAZknPr1ltv1cCBAzV58mQ9/fTTqlOnjiQ5/zc/nn/+eY0ePVrt2rVTu3bt9PPPP+v2229Xenq6S7+RI0dq7Nix6tu3r5o1a6aUlBRt2LBBP//8s9q0aZPv4wO4whgAuEJMnz7dSDLr168/b5/AwEDTqFEj5/MXXnjB2H/VvfHGG0aS+f3338+7j/Xr1xtJZvr06dm23XbbbUaSmTp1ao7bbrvtNufz5cuXG0mmSpUqJiUlxdn+xRdfGElm0qRJzrbq1aubnj17XnSfF6qtZ8+epnr16s7n8+fPN5LM6NGjXfrde++9xuFwmN27dzvbJBkvLy+Xti1bthhJ5s0338x2LNvEiRONJPPJJ58429LT001kZKTx9/d3Offq1aub9u3bX3B/ue27f/9+4+HhYcaMGePS/ssvvxhPT0+X9qzPbebMmc62tLQ0ExISYjp37uxse/31140kM3/+fGfbqVOnTO3atY0ks3z5cmd7+/btXd7vLFmfe506dUxaWpqzfdKkSUaS+eWXX4wxxmzatMlIMnPmzLn4m2FJT083FStWNPXr1zenTp1ytn/77bdGknn++eedbbn5M3Nu33379jnbqlevbiSZVatWOduSkpKMt7e3GTZsWLbXNm7c2KSnpzvbx48fbySZr7/+2tkmybzwwgvZjn/un4E5c+Zke89z69xzSUpKMl5eXqZ9+/YmMzPT2e/pp582klyOe9111+V6jAIovrjcDkCx4u/vf8FV7rJmFr7++ut8L3Lg7e2t3r1757r/gw8+qNKlSzuf33vvvapUqZK+++67fB0/t7777jt5eHho4MCBLu3Dhg2TMUYLFy50aY+OjnaZaWjYsKECAgK0d+/eix4nJCRE3bt3d7aVLFlSAwcOVGpqqlauXFkAZ5Pd3LlzlZmZqS5duuiPP/5wPkJCQhQREZFt9sff31/333+/87mXl5eaNWvmcn6LFi1SlSpVdNdddznbfHx8zjszeSG9e/d2uU8ta+Yv63hZM0WLFy/W33//nev9btiwQUlJSXrsscfk4+PjbG/fvr1q166t//73v3mu9ULq1q3rrF36Z/avVq1aOY6Lfv36udwb9+ijj8rT0/Oyj/WLWbp0qdLT0/X444+7zGjltOhGUFCQtm/frl27dhVihQCKGkISgGIlNTXVJZCcq2vXrmrevLn69u2r4OBgdevWTV988UWeAlOVKlXytEhDRESEy3OHw6GaNWte9qWNDxw4oMqVK2d7P7IuWTpw4IBLe7Vq1bLto0yZMtnuKcnpOBERESpRwvWflPMdp6Ds2rVLxhhFRESoQoUKLo/ffvvNuWhBlqpVq2a75Ovc8ztw4IBq1KiRrV/NmjXzXN+572eZMmUkyXm88PBwDR06VO+//77Kly+vmJgYvf322xe9Hynr/axVq1a2bbVr1y7w9zsv4+Lcse7v769KlSq5fRnvrPfk3PoqVKjg/FyyvPjiizp+/LiuvfZaNWjQQE8++aS2bt1aaLUCKBoISQCKjcOHDys5OfmCX2h9fX21atUqLV26VA888IC2bt2qrl27qk2bNsrIyMjVcfJyH1Fune9+jdzWVBDOtxqYOWeRh6IiMzNTDodDixYt0pIlS7I9pk2b5tK/sM8vN8d7/fXXtXXrVj399NM6deqUBg4cqHr16unw4cOXpab8KKz3rTDH+oXceuut2rNnjz788EPVr19f77//vm644Qa9//777i4NQCEiJAEoNj7++GNJUkxMzAX7lShRQq1bt9aECRP066+/asyYMVq2bJnz8qy83GCeG+detmOM0e7du11WQytTpoyOHz+e7bXnzgrkpbbq1avr6NGj2S4/3LFjh3N7Qahevbp27dqVbTauoI9zrho1asgYo/DwcEVHR2d73HTTTXneZ/Xq1bVnz55sAeDcVemkghsnDRo00LPPPqtVq1bphx9+0JEjRzR16tQL1ihJcXFx2bbFxcVdtvc7N84d66mpqYqPj7/oWE9PT1d8fLxLW0H+Ocx6T86t7/fff89xRqxs2bLq3bu3PvvsMx06dEgNGzbMcUU+AMUXIQlAsbBs2TK99NJLCg8PV48ePc7b79ixY9nasn6UNS0tTZLk5+cnSTmGlvyYOXOmS1D58ssvFR8f71xRTfrnC//atWtdVtr69ttvsy1lnZfa2rVrp4yMDL311lsu7W+88YYcDofL8S9Fu3btlJCQoM8//9zZdvbsWb355pvy9/fXbbfdViDHOVenTp3k4eGhUaNGZQs1xhj9+eefed5nTEyMjhw5om+++cbZdvr0ab333nvZ+vr5+eVqqe7zSUlJ0dmzZ13aGjRooBIlSjjHYk6aNGmiihUraurUqS79Fi5cqN9++03t27fPd02X6t1339WZM2ecz6dMmaKzZ89mG+urVq3K9rpzZ5IK8s9hdHS0SpYsqTfffNNlrEycODFb33PHjb+/v2rWrHnBzwRA8cMS4ACuOAsXLtSOHTt09uxZJSYmatmyZVqyZImqV6+ub775xuVm9nO9+OKLWrVqldq3b6/q1asrKSlJ77zzjqpWraqoqChJ/3yJCwoK0tSpU1W6dGn5+fnpxhtvVHh4eL7qLVu2rKKiotS7d28lJiZq4sSJqlmzpstiAH379tWXX36pO+64Q126dNGePXv0ySefZFuyOS+1dejQQS1bttQzzzyj/fv367rrrtP333+vr7/+WoMHD87XctA56devn6ZNm6ZevXpp48aNCgsL05dffqnVq1dr4sSJF7xH7GJ2796t0aNHZ2tv1KiR2rdvr9GjR2vEiBHav3+/OnbsqNKlS2vfvn2aN2+e+vXrpyeeeCJPx3v44Yf11ltvqXv37ho0aJAqVaqkTz/91Dmm7NmNxo0b6/PPP9fQoUPVtGlT+fv7q0OHDrk+1rJlyzRgwADdd999uvbaa3X27Fl9/PHH8vDwUOfOnc/7upIlS+qVV15R7969ddttt6l79+7OJcDDwsI0ZMiQPJ1zQUpPT1fr1q3VpUsXxcXF6Z133lFUVJTLQhh9+/bVI488os6dO6tNmzbasmWLFi9erPLly7vs6/rrr5eHh4deeeUVJScny9vbW61atVLFihXzXFeFChX0xBNPaOzYsbrzzjvVrl07bdq0SQsXLsx23Lp166pFixZq3LixypYtqw0bNujLL7/UgAED8vemALgyuWdRPQDIu6xlfbMeXl5eJiQkxLRp08ZMmjTJZanpLOcuAR4bG2vuvvtuU7lyZePl5WUqV65sunfvbnbu3Onyuq+//trUrVvXeHp6uiy5fdttt5l69erlWN/5lgD/7LPPzIgRI0zFihWNr6+vad++vTlw4EC217/++uumSpUqxtvb2zRv3txs2LAh2z4vVNu5S4AbY8yJEyfMkCFDTOXKlU3JkiVNRESEefXVV12WQTbmn2WZ+/fvn62m8y1Nfq7ExETTu3dvU758eePl5WUaNGiQ4zLleV0C3P687UefPn2c/b766isTFRVl/Pz8jJ+fn6ldu7bp37+/iYuLc/Y53+eW03u2d+9e0759e+Pr62sqVKhghg0bZr766isjyaxdu9bZLzU11fzrX/8yQUFBRpJzP1mf+7lLe+/bt8/l89q7d6/597//bWrUqGF8fHxM2bJlTcuWLc3SpUtz9f58/vnnplGjRsbb29uULVvW9OjRwxw+fNilT0EsAZ7T53XuuMx67cqVK02/fv1MmTJljL+/v+nRo4f5888/XV6bkZFhhg8fbsqXL29KlSplYmJizO7du3Mca++995655pprjIeHR56WA8/pXDIyMsyoUaNMpUqVjK+vr2nRooXZtm1btuOOHj3aNGvWzAQFBRlfX19Tu3ZtM2bMGJelzQEUfw5jiugduQAAFBETJ07UkCFDdPjwYVWpUsXd5RQ5WT9uu379ejVp0sTd5QDAJeOeJAAALKdOnXJ5fvr0aU2bNk0REREEJAC4SnBPEgAAlk6dOqlatWq6/vrrlZycrE8++UQ7duzQp59+6u7SrnqpqalKTU29YJ8KFSqcd9lyAMgtQhIAAJaYmBi9//77+vTTT5WRkaG6detq9uzZ6tq1q7tLu+q99tprGjVq1AX77Nu3z2XJcQDID+5JAgAAV4S9e/dq7969F+wTFRV1wRUuASA3CEkAAAAAYGHhBgAAAACwFPt7kjIzM3X06FGVLl3a5UcAAQAAAFxdjDE6ceKEKleurBIlzj9fVOxD0tGjRxUaGuruMgAAAAAUEYcOHVLVqlXPu73Yh6TSpUtL+ueNCAgIcHM1AAAAANwlJSVFoaGhzoxwPsU+JGVdYhcQEEBIAgAAAHDR23BYuAEAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMDi6e4CrjYdOri7gv9ZsMDdFQAAAABFDzNJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFiKTEgaN26cHA6HBg8e7Gw7ffq0+vfvr3Llysnf31+dO3dWYmKi+4oEAAAAUOwViZC0fv16TZs2TQ0bNnRpHzJkiBYsWKA5c+Zo5cqVOnr0qDp16uSmKgEAAABcDdweklJTU9WjRw+99957KlOmjLM9OTlZH3zwgSZMmKBWrVqpcePGmj59uv7v//5Pa9euPe/+0tLSlJKS4vIAAAAAgNxye0jq37+/2rdvr+joaJf2jRs36syZMy7ttWvXVrVq1bRmzZrz7m/s2LEKDAx0PkJDQy9b7QAAAACKH7eGpNmzZ+vnn3/W2LFjs21LSEiQl5eXgoKCXNqDg4OVkJBw3n2OGDFCycnJzsehQ4cKumwAAAAAxZinuw586NAhDRo0SEuWLJGPj0+B7dfb21ve3t4Ftj8AAAAAVxe3zSRt3LhRSUlJuuGGG+Tp6SlPT0+tXLlSkydPlqenp4KDg5Wenq7jx4+7vC4xMVEhISHuKRoAAABAsee2maTWrVvrl19+cWnr3bu3ateureHDhys0NFQlS5ZUbGysOnfuLEmKi4vTwYMHFRkZ6Y6SAQAAAFwF3BaSSpcurfr167u0+fn5qVy5cs72Pn36aOjQoSpbtqwCAgL0+OOPKzIyUjfddJM7SgYAAABwFXBbSMqNN954QyVKlFDnzp2VlpammJgYvfPOO+4uCwAAAEAx5jDGGHcXcTmlpKQoMDBQycnJCggIcHc56tDB3RX8z4IF7q4AAAAAKDy5zQZu/50kAAAAAChKCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYHFrSJoyZYoaNmyogIAABQQEKDIyUgsXLnRuP336tPr3769y5crJ399fnTt3VmJiohsrBgAAAFDcuTUkVa1aVePGjdPGjRu1YcMGtWrVSnfffbe2b98uSRoyZIgWLFigOXPmaOXKlTp69Kg6derkzpIBAAAAFHMOY4xxdxG2smXL6tVXX9W9996rChUqaNasWbr33nslSTt27FCdOnW0Zs0a3XTTTbnaX0pKigIDA5WcnKyAgIDLWXqudOjg7gr+Z8ECd1cAAAAAFJ7cZoMic09SRkaGZs+erZMnTyoyMlIbN27UmTNnFB0d7exTu3ZtVatWTWvWrDnvftLS0pSSkuLyAAAAAIDccntI+uWXX+Tv7y9vb2898sgjmjdvnurWrauEhAR5eXkpKCjIpX9wcLASEhLOu7+xY8cqMDDQ+QgNDb3MZwAAAACgOHF7SKpVq5Y2b96sdevW6dFHH1XPnj3166+/5nt/I0aMUHJysvNx6NChAqwWAAAAQHHn6e4CvLy8VLNmTUlS48aNtX79ek2aNEldu3ZVenq6jh8/7jKblJiYqJCQkPPuz9vbW97e3pe7bAAAAADFlNtnks6VmZmptLQ0NW7cWCVLllRsbKxzW1xcnA4ePKjIyEg3VggAAACgOHPrTNKIESPUtm1bVatWTSdOnNCsWbO0YsUKLV68WIGBgerTp4+GDh2qsmXLKiAgQI8//rgiIyNzvbIdAAAAAOSVW0NSUlKSHnzwQcXHxyswMFANGzbU4sWL1aZNG0nSG2+8oRIlSqhz585KS0tTTEyM3nnnHXeWDAAAAKCYK3K/k1TQ+J2k8+N3kgAAAHA1ueJ+JwkAAAAAigJCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGDJV0jau3dvQdcBAAAAAEVCvkJSzZo11bJlS33yySc6ffp0QdcEAAAAAG6Tr5D0888/q2HDhho6dKhCQkL08MMP66effiro2gAAAACg0OUrJF1//fWaNGmSjh49qg8//FDx8fGKiopS/fr1NWHCBP3+++8FXScAAAAAFIpLWrjB09NTnTp10pw5c/TKK69o9+7deuKJJxQaGqoHH3xQ8fHxBVUnAAAAABSKSwpJGzZs0GOPPaZKlSppwoQJeuKJJ7Rnzx4tWbJER48e1d13311QdQIAAABAofDMz4smTJig6dOnKy4uTu3atdPMmTPVrl07lSjxT+YKDw/XjBkzFBYWVpC1AgAAAMBll6+QNGXKFP373/9Wr169VKlSpRz7VKxYUR988MElFQcAAAAAhS1fIWnXrl0X7ePl5aWePXvmZ/cAAAAA4Db5uidp+vTpmjNnTrb2OXPm6KOPPrrkogAAAADAXfIVksaOHavy5ctna69YsaJefvnlSy4KAAAAANwlXyHp4MGDCg8Pz9ZevXp1HTx48JKLAgAAAAB3yVdIqlixorZu3ZqtfcuWLSpXrtwlFwUAAAAA7pKvkNS9e3cNHDhQy5cvV0ZGhjIyMrRs2TINGjRI3bp1K+gaAQAAAKDQ5Gt1u5deekn79+9X69at5en5zy4yMzP14IMPck8SAAAAgCtavkKSl5eXPv/8c7300kvasmWLfH191aBBA1WvXr2g6wMAAACAQpWvkJTl2muv1bXXXltQtQAAAACA2+UrJGVkZGjGjBmKjY1VUlKSMjMzXbYvW7asQIoDAAAAgMKWr5A0aNAgzZgxQ+3bt1f9+vXlcDgKui4AAAAAcIt8haTZs2friy++ULt27Qq6HgAAAABwq3wtAe7l5aWaNWsWdC0AAAAA4Hb5CknDhg3TpEmTZIwp6HoAAAAAwK3ydbndjz/+qOXLl2vhwoWqV6+eSpYs6bJ97ty5BVIcAAAAABS2fIWkoKAg3XPPPQVdCwAAAAC4Xb5C0vTp0wu6DgAAAAAoEvJ1T5IknT17VkuXLtW0adN04sQJSdLRo0eVmppaYMUBAAAAQGHL10zSgQMHdMcdd+jgwYNKS0tTmzZtVLp0ab3yyitKS0vT1KlTC7pOAAAAACgU+ZpJGjRokJo0aaK//vpLvr6+zvZ77rlHsbGxBVYcAAAAABS2fM0k/fDDD/q///s/eXl5ubSHhYXpyJEjBVIYAAAAALhDvmaSMjMzlZGRka398OHDKl269CUXBQAAAADukq+QdPvtt2vixInO5w6HQ6mpqXrhhRfUrl27gqoNAAAAAApdvi63e/311xUTE6O6devq9OnT+te//qVdu3apfPny+uyzzwq6RgAAAAAoNPkKSVWrVtWWLVs0e/Zsbd26VampqerTp4969OjhspADAAAAAFxp8hWSJMnT01P3339/QdYCAAAAAG6Xr5A0c+bMC25/8MEH81UMAAAAALhbvkLSoEGDXJ6fOXNGf//9t7y8vFSqVClCEgAAAIArVr5Wt/vrr79cHqmpqYqLi1NUVBQLNwAAAAC4ouUrJOUkIiJC48aNyzbLBAAAAABXkgILSdI/izkcPXq0IHcJAAAAAIUqX/ckffPNNy7PjTGKj4/XW2+9pebNmxdIYQAAAADgDvkKSR07dnR57nA4VKFCBbVq1Uqvv/56QdQFAAAAAG6Rr5CUmZlZ0HUAAAAAQJFQoPckAQAAAMCVLl8zSUOHDs113wkTJuTnEAAAAADgFvkKSZs2bdKmTZt05swZ1apVS5K0c+dOeXh46IYbbnD2czgcBVMlAAAAABSSfIWkDh06qHTp0vroo49UpkwZSf/8wGzv3r11yy23aNiwYQVaJAAAAAAUFocxxuT1RVWqVNH333+vevXqubRv27ZNt99+e5H6raSUlBQFBgYqOTlZAQEB7i5HHTq4u4L/WbDA3RUAAAAAhSe32SBfCzekpKTo999/z9b++++/68SJE/nZJQAAAAAUCfkKSffcc4969+6tuXPn6vDhwzp8+LC++uor9enTR506dSroGgEAAACg0OTrnqSpU6fqiSee0L/+9S+dOXPmnx15eqpPnz569dVXC7RAAAAAAChM+bonKcvJkye1Z88eSVKNGjXk5+dXYIUVFO5JOj/uSQIAAMDV5LLek5QlPj5e8fHxioiIkJ+fny4hbwEAAABAkZCvkPTnn3+qdevWuvbaa9WuXTvFx8dLkvr06cPy3wAAAACuaPkKSUOGDFHJkiV18OBBlSpVytnetWtXLVq0qMCKAwAAAIDClq+FG77//nstXrxYVatWdWmPiIjQgQMHCqQwAAAAAHCHfM0knTx50mUGKcuxY8fk7e19yUUBAAAAgLvkKyTdcsstmjlzpvO5w+FQZmamxo8fr5YtWxZYcQAAAABQ2PJ1ud348ePVunVrbdiwQenp6Xrqqae0fft2HTt2TKtXry7oGgEAAACg0ORrJql+/frauXOnoqKidPfdd+vkyZPq1KmTNm3apBo1ahR0jQAAAABQaPI8k3TmzBndcccdmjp1qp555pnLURMAAAAAuE2eZ5JKliyprVu3Xo5aAAAAAMDt8nW53f33368PPvigoGsBAAAAALfL18INZ8+e1YcffqilS5eqcePG8vPzc9k+YcKEAikOAAAAAApbnkLS3r17FRYWpm3btumGG26QJO3cudOlj8PhKLjqAAAAAKCQ5elyu4iICP3xxx9avny5li9frooVK2r27NnO58uXL9eyZctyvb+xY8eqadOmKl26tCpWrKiOHTsqLi7Opc/p06fVv39/lStXTv7+/urcubMSExPzUjYAAAAA5FqeQpIxxuX5woULdfLkyXwffOXKlerfv7/Wrl2rJUuW6MyZM7r99ttd9jlkyBAtWLBAc+bM0cqVK3X06FF16tQp38cEAAAAgAvJ1z1JWc4NTXm1aNEil+czZsxQxYoVtXHjRt16661KTk7WBx98oFmzZqlVq1aSpOnTp6tOnTpau3atbrrppks6PgAAAACcK08zSQ6HI9s9RwV5D1JycrIkqWzZspKkjRs36syZM4qOjnb2qV27tqpVq6Y1a9bkuI+0tDSlpKS4PAAAAAAgt/I0k2SMUa9eveTt7S3pn/uFHnnkkWyr282dOzfPhWRmZmrw4MFq3ry56tevL0lKSEiQl5eXgoKCXPoGBwcrISEhx/2MHTtWo0aNyvPxAQAAAEDKY0jq2bOny/P777+/wArp37+/tm3bph9//PGS9jNixAgNHTrU+TwlJUWhoaGXWh4AAACAq0SeQtL06dMvSxEDBgzQt99+q1WrVqlq1arO9pCQEKWnp+v48eMus0mJiYkKCQnJcV/e3t7OmS4AAAAAyKs83ZNU0IwxGjBggObNm6dly5YpPDzcZXvjxo1VsmRJxcbGOtvi4uJ08OBBRUZGFna5AAAAAK4Cl7S63aXq37+/Zs2apa+//lqlS5d23mcUGBgoX19fBQYGqk+fPho6dKjKli2rgIAAPf7444qMjGRlOwAAAACXhVtD0pQpUyRJLVq0cGmfPn26evXqJUl64403VKJECXXu3FlpaWmKiYnRO++8U8iVAgAAALhaOMyl/thREZeSkqLAwEAlJycrICDA3eWoQwd3V/A/Cxa4uwIAAACg8OQ2G7j1niQAAAAAKGoISQAAAABgISQBAAAAgMWtCzcANu7XKvr4jAAAwNWAmSQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwOLp7gLgPh06uLsCAAAAoOhhJgkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMDi6e4CgKKoQwd3VwAAAAB3YSYJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAsnu4uAADyo0MHd1dQdC1Y4O4K/qcofU5F6X0BABRtzCQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGDxdHcBAICC1aGDuysAAODKxkwSAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWNwaklatWqUOHTqocuXKcjgcmj9/vst2Y4yef/55VapUSb6+voqOjtauXbvcUywAAACAq4JbQ9LJkyd13XXX6e23385x+/jx4zV58mRNnTpV69atk5+fn2JiYnT69OlCrhQAAADA1cKtv5PUtm1btW3bNsdtxhhNnDhRzz77rO6++25J0syZMxUcHKz58+erW7duhVkqAAAAgKtEkb0nad++fUpISFB0dLSzLTAwUDfeeKPWrFlz3telpaUpJSXF5QEAAAAAuVVkQ1JCQoIkKTg42KU9ODjYuS0nY8eOVWBgoPMRGhp6WesEAAAAULwU2ZCUXyNGjFBycrLzcejQIXeXBAAAAOAKUmRDUkhIiCQpMTHRpT0xMdG5LSfe3t4KCAhweQAAAABAbhXZkBQeHq6QkBDFxsY621JSUrRu3TpFRka6sTIAAAAAxZlbV7dLTU3V7t27nc/37dunzZs3q2zZsqpWrZoGDx6s0aNHKyIiQuHh4XruuedUuXJldezY0X1FAwAAACjW3BqSNmzYoJYtWzqfDx06VJLUs2dPzZgxQ0899ZROnjypfv366fjx44qKitKiRYvk4+PjrpIBAAAAFHMOY4xxdxGXU0pKigIDA5WcnFwk7k/q0MHdFQDA1WnBAndXAABwt9xmgyJ7TxIAAAAAuAMhCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACye7i4AAIDC0KGDuytAbixY4O4KAICZJAAAAABwQUgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwEJIAAAAAwEJIAgAAAAALIQkAAAAALIQkAAAAALAQkgAAAADAQkgCAAAAAAshCQAAAAAshCQAAAAAsBCSAAAAAMBCSAIAAAAAi6e7CwAAAABQsDp0cHcFrhYscHcFecNMEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAAAAAFkISAAAAAFgISQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWDzdXQAAAECWDh3cXcH/LFjg7goAuAszSQAAAABgISQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYPF0dwEAAAC4uA4d3F3B/yxY4O4KgMuLmSQAAAAAsBCSAAAAAMBCSAIAAAAACyEJAAAAACyEJAAAAACwXBEh6e2331ZYWJh8fHx044036qeffnJ3SQAAAACKqSIfkj7//HMNHTpUL7zwgn7++Wddd911iomJUVJSkrtLAwAAAFAMFfmQNGHCBD300EPq3bu36tatq6lTp6pUqVL68MMP3V0aAAAAgGKoSP+YbHp6ujZu3KgRI0Y420qUKKHo6GitWbMmx9ekpaUpLS3N+Tw5OVmSlJKScnmLzaUzZ9xdAQAAyI0i8tXBqSh9hyhq7w2yK0rjRSo6YyYrExhjLtivSIekP/74QxkZGQoODnZpDw4O1o4dO3J8zdixYzVq1Khs7aGhoZelRgAAUDwFBrq7gqKL9wZ5VdTGzIkTJxR4gaKKdEjKjxEjRmjo0KHO55mZmTp27JjKlSsnh8NRoMdKSUlRaGioDh06pICAgALdN4oPxgkuhjGC3GCcIDcYJ8iNq3mcGGN04sQJVa5c+YL9inRIKl++vDw8PJSYmOjSnpiYqJCQkBxf4+3tLW9vb5e2oKCgy1WiJCkgIOCqG2DIO8YJLoYxgtxgnCA3GCfIjat1nFxoBilLkV64wcvLS40bN1ZsbKyzLTMzU7GxsYqMjHRjZQAAAACKqyI9kyRJQ4cOVc+ePdWkSRM1a9ZMEydO1MmTJ9W7d293lwYAAACgGCryIalr1676/fff9fzzzyshIUHXX3+9Fi1alG0xB3fw9vbWCy+8kO3yPsDGOMHFMEaQG4wT5AbjBLnBOLk4h7nY+ncAAAAAcBUp0vckAQAAAEBhIyQBAAAAgIWQBAAAAAAWQhIAAAAAWAhJ+fT2228rLCxMPj4+uvHGG/XTTz+5uyQUkrFjx6pp06YqXbq0KlasqI4dOyouLs6lz+nTp9W/f3+VK1dO/v7+6ty5c7YfRT548KDat2+vUqVKqWLFinryySd19uzZwjwVFKJx48bJ4XBo8ODBzjbGCSTpyJEjuv/++1WuXDn5+vqqQYMG2rBhg3O7MUbPP/+8KlWqJF9fX0VHR2vXrl0u+zh27Jh69OihgIAABQUFqU+fPkpNTS3sU8FlkpGRoeeee07h4eHy9fVVjRo19NJLL8lee4txcvVZtWqVOnTooMqVK8vhcGj+/Pku2wtqTGzdulW33HKLfHx8FBoaqvHjx1/uUysaDPJs9uzZxsvLy3z44Ydm+/bt5qGHHjJBQUEmMTHR3aWhEMTExJjp06ebbdu2mc2bN5t27dqZatWqmdTUVGefRx55xISGhprY2FizYcMGc9NNN5mbb77Zuf3s2bOmfv36Jjo62mzatMl89913pnz58mbEiBHuOCVcZj/99JMJCwszDRs2NIMGDXK2M05w7NgxU716ddOrVy+zbt06s3fvXrN48WKze/duZ59x48aZwMBAM3/+fLNlyxZz1113mfDwcHPq1ClnnzvuuMNcd911Zu3ateaHH34wNWvWNN27d3fHKeEyGDNmjClXrpz59ttvzb59+8ycOXOMv7+/mTRpkrMP4+Tq891335lnnnnGzJ0710gy8+bNc9leEGMiOTnZBAcHmx49epht27aZzz77zPj6+ppp06YV1mm6DSEpH5o1a2b69+/vfJ6RkWEqV65sxo4d68aq4C5JSUlGklm5cqUxxpjjx4+bkiVLmjlz5jj7/Pbbb0aSWbNmjTHmn7/YSpQoYRISEpx9pkyZYgICAkxaWlrhngAuqxMnTpiIiAizZMkSc9tttzlDEuMExhgzfPhwExUVdd7tmZmZJiQkxLz66qvOtuPHjxtvb2/z2WefGWOM+fXXX40ks379emefhQsXGofDYY4cOXL5ikehad++vfn3v//t0tapUyfTo0cPYwzjBCZbSCqoMfHOO++YMmXKuPybM3z4cFOrVq3LfEbux+V2eZSenq6NGzcqOjra2VaiRAlFR0drzZo1bqwM7pKcnCxJKlu2rCRp48aNOnPmjMsYqV27tqpVq+YcI2vWrFGDBg1cfhQ5JiZGKSkp2r59eyFWj8utf//+at++vct4kBgn+Mc333yjJk2a6L777lPFihXVqFEjvffee87t+/btU0JCgss4CQwM1I033ugyToKCgtSkSRNnn+joaJUoUULr1q0rvJPBZXPzzTcrNjZWO3fulCRt2bJFP/74o9q2bSuJcYLsCmpMrFmzRrfeequ8vLycfWJiYhQXF6e//vqrkM7GPTzdXcCV5o8//lBGRobLlxZJCg4O1o4dO9xUFdwlMzNTgwcPVvPmzVW/fn1JUkJCgry8vBQUFOTSNzg4WAkJCc4+OY2hrG0oHmbPnq2ff/5Z69evz7aNcQJJ2rt3r6ZMmaKhQ4fq6aef1vr16zVw4EB5eXmpZ8+ezs85p3Fgj5OKFSu6bPf09FTZsmUZJ8XEf/7zH6WkpKh27dry8PBQRkaGxowZox49ekgS4wTZFNSYSEhIUHh4eLZ9ZG0rU6bMZam/KCAkAZegf//+2rZtm3788Ud3l4Ii5tChQxo0aJCWLFkiHx8fd5eDIiozM1NNmjTRyy+/LElq1KiRtm3bpqlTp6pnz55urg5FxRdffKFPP/1Us2bNUr169bR582YNHjxYlStXZpwAlwmX2+VR+fLl5eHhkW0FqsTERIWEhLipKrjDgAED9O2332r58uWqWrWqsz0kJETp6ek6fvy4S397jISEhOQ4hrK24cq3ceNGJSUl6YYbbpCnp6c8PT21cuVKTZ48WZ6engoODmacQJUqVVLdunVd2urUqaODBw9K+t/nfKF/c0JCQpSUlOSy/ezZszp27BjjpJh48skn9Z///EfdunVTgwYN9MADD2jIkCEaO3asJMYJsiuoMXE1/ztESMojLy8vNW7cWLGxsc62zMxMxcbGKjIy0o2VobAYYzRgwADNmzdPy5YtyzYN3bhxY5UsWdJljMTFxengwYPOMRIZGalffvnF5S+nJUuWKCAgINsXJlyZWrdurV9++UWbN292Ppo0aaIePXo4/5txgubNm2f7CYGdO3eqevXqkqTw8HCFhIS4jJOUlBStW7fOZZwcP35cGzdudPZZtmyZMjMzdeONNxbCWeBy+/vvv1WihOtXNg8PD2VmZkpinCC7ghoTkZGRWrVqlc6cOePss2TJEtWqVatYX2oniSXA82P27NnG29vbzJgxw/z666+mX79+JigoyGUFKhRfjz76qAkMDDQrVqww8fHxzsfff//t7PPII4+YatWqmWXLlpkNGzaYyMhIExkZ6dyetbTz7bffbjZv3mwWLVpkKlSowNLOxZy9up0xjBP8szy8p6enGTNmjNm1a5f59NNPTalSpcwnn3zi7DNu3DgTFBRkvv76a7N161Zz991357iMb6NGjcy6devMjz/+aCIiIljauRjp2bOnqVKlinMJ8Llz55ry5cubp556ytmHcXL1OXHihNm0aZPZtGmTkWQmTJhgNm3aZA4cOGCMKZgxcfz4cRMcHGweeOABs23bNjN79mxTqlQplgDH+b355pumWrVqxsvLyzRr1sysXbvW3SWhkEjK8TF9+nRnn1OnTpnHHnvMlClTxpQqVcrcc889Jj4+3mU/+/fvN23btjW+vr6mfPnyZtiwYebMmTOFfDYoTOeGJMYJjDFmwYIFpn79+sbb29vUrl3bvPvuuy7bMzMzzXPPPWeCg4ONt7e3ad26tYmLi3Pp8+eff5ru3bsbf39/ExAQYHr37m1OnDhRmKeByyglJcUMGjTIVKtWzfj4+JhrrrnGPPPMMy7LMjNOrj7Lly/P8ftIz549jTEFNya2bNlioqKijLe3t6lSpYoZN25cYZ2iWzmMsX6uGQAAAACuctyTBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABYCEkAAAAAYCEkAQAAAICFkAQAcKtevXqpY8eOBb7fhIQEtWnTRn5+fgoKCirUY18OYWFhmjhx4gX7OBwOzZ8/v1DqAYDijJAEAFeBohAG9u/fL4fDoc2bNxfK8d544w3Fx8dr8+bN2rlzZ459Jk2apBkzZhRKPbYZM2acN7idz/r169WvX7/LUxAAwIWnuwsAAOBy2LNnjxo3bqyIiIjz9gkMDCzEii5NhQoV3F0CAFw1mEkCAGjbtm1q27at/P39FRwcrAceeEB//PGHc3uLFi00cOBAPfXUUypbtqxCQkI0cuRIl33s2LFDUVFR8vHxUd26dbV06VKXy7/Cw8MlSY0aNZLD4VCLFi1cXv/aa6+pUqVKKleunPr3768zZ85csOYpU6aoRo0a8vLyUq1atfTxxx87t4WFhemrr77SzJkz5XA41KtXrxz3ce4MW27O0+FwaMqUKWrbtq18fX11zTXX6Msvv3RuX7FihRwOh44fP+5s27x5sxwOh/bv368VK1aod+/eSk5OlsPhkMPhyHaMnJx7ud2uXbt06623Ot/vJUuWuPRPT0/XgAEDVKlSJfn4+Kh69eoaO3bsRY8DACAkAcBV7/jx42rVqpUaNWqkDRs2aNGiRUpMTFSXLl1c+n300Ufy8/PTunXrNH78eL344ovOL+YZGRnq2LGjSpUqpXXr1undd9/VM8884/L6n376SZK0dOlSxcfHa+7cuc5ty5cv1549e7R8+XJ99NFHmjFjxgUvg5s3b54GDRqkYcOGadu2bXr44YfVu3dvLV++XNI/l6bdcccd6tKli+Lj4zVp0qRcvx8XOs8szz33nDp37qwtW7aoR48e6tatm3777bdc7f/mm2/WxIkTFRAQoPj4eMXHx+uJJ57IdX2SlJmZqU6dOsnLy0vr1q3T1KlTNXz4cJc+kydP1jfffKMvvvhCcXFx+vTTTxUWFpan4wDA1YrL7QDgKvfWW2+pUaNGevnll51tH374oUJDQ7Vz505de+21kqSGDRvqhRdekCRFRETorbfeUmxsrNq0aaMlS5Zoz549WrFihUJCQiRJY8aMUZs2bZz7zLpcrFy5cs4+WcqUKaO33npLHh4eql27ttq3b6/Y2Fg99NBDOdb82muvqVevXnrsscckSUOHDtXatWv12muvqWXLlqpQoYK8vb3l6+ub7VgXc6HzzHLfffepb9++kqSXXnpJS5Ys0Ztvvql33nnnovv38vJSYGCgHA5HnmvLsnTpUu3YsUOLFy9W5cqVJUkvv/yy2rZt6+xz8OBBRUREKCoqSg6HQ9WrV8/XsQDgasRMEgBc5bZs2aLly5fL39/f+ahdu7akf+7rydKwYUOX11WqVElJSUmSpLi4OIWGhrp86W/WrFmua6hXr548PDxy3HdOfvvtNzVv3tylrXnz5rmezbmQC51nlsjIyGzPC+LYufXbb78pNDTUGZByqqlXr17avHmzatWqpYEDB+r7778vtPoA4ErHTBIAXOVSU1PVoUMHvfLKK9m2VapUyfnfJUuWdNnmcDiUmZlZIDVczn0Xdi0lSvzz/z8aY5xtF7u/6nK44YYbtG/fPi1cuFBLly5Vly5dFB0d7XL/FAAgZ8wkAcBV7oYbbtD27dsVFhammjVrujz8/PxytY9atWrp0KFDSkxMdLatX7/epY+Xl5ekf+5fulR16tTR6tWrXdpWr16tunXrXvK+c2Pt2rXZntepU0fS/y4rjI+Pd24/d9lzLy+vS3of6tSpo0OHDrkc49yaJCkgIEBdu3bVe++9p88//1xfffWVjh07lu/jAsDVgpkkALhKJCcnZ/uynrWS3Hvvvafu3bs7V3XbvXu3Zs+erffff9/lMrjzadOmjWrUqKGePXtq/PjxOnHihJ599llJ/8zESFLFihXl6+urRYsWqWrVqvLx8cn3EtxPPvmkunTpokaNGik6OloLFizQ3LlztXTp0nztL6/mzJmjJk2aKCoqSp9++ql++uknffDBB5KkmjVrKjQ0VCNHjtSYMWO0c+dOvf766y6vDwsLU2pqqmJjY3XdddepVKlSKlWqVK6PHx0drWuvvVY9e/bUq6++qpSUlGwLZUyYMEGVKlVSo0aNVKJECc2ZM0chISF5/n0mALgaMZMEAFeJFStWqFGjRi6PUaNGqXLlylq9erUyMjJ0++23q0GDBho8eLCCgoKcl45djIeHh+bPn6/U1FQ1bdpUffv2dX5p9/HxkSR5enpq8uTJmjZtmipXrqy777473+fSsWNHTZo0Sa+99prq1aunadOmafr06dmWFb9cRo0apdmzZ6thw4aaOXOmPvvsM+csVsmSJfXZZ59px44datiwoV555RWNHj3a5fU333yzHnnkEXXt2lUVKlTQ+PHj83T8EiVKaN68eTp16pSaNWumvn37asyYMS59SpcurfHjx6tJkyZq2rSp9u/fr++++y7XnykAXM0cxr5oGgCAArJ69WpFRUVp9+7dqlGjhrvLKTAOh0Pz5s1z+X0lAEDxwuV2AIACMW/ePPn7+ysiIkK7d+/WoEGD1Lx582IVkAAAVwdCEgCgQJw4cULDhw/XwYMHVb58eUVHR2e7Fwc5++GHH1x+4+hcqamphVgNAIDL7QAAcLNTp07pyJEj591es2bNQqwGAEBIAgAAAAALS9wAAAAAgIWQBAAAAAAWQhIAAAAAWAhJAAAAAGAhJAEAAACAhZAEAAAAABZCEgAAAABY/h92VmvvetNxDAAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 1000x600 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset):\n",
    "    lengths = [len(x['input_ids']) for x in tokenized_train_dataset]\n",
    "    lengths += [len(x['input_ids']) for x in tokenized_val_dataset]\n",
    "    print(len(lengths))\n",
    "\n",
    "    # Plotting the histogram\n",
    "    plt.figure(figsize=(10, 6))\n",
    "    plt.hist(lengths, bins=20, alpha=0.7, color='blue')\n",
    "    plt.xlabel('Length of input_ids')\n",
    "    plt.ylabel('Frequency')\n",
    "    plt.title('Distribution of Lengths of input_ids')\n",
    "    plt.show()\n",
    "\n",
    "plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nBk4Qp_vyRgh"
   },
   "source": [
    "From here, you can choose where you'd like to set the `max_length` to be. You can truncate and pad training examples to fit them to your chosen size. Be aware that choosing a larger `max_length` has its compute tradeoffs.\n",
    "\n",
    "I'm using my personal notes to train the model, and they vary greatly in length. I spent some time cleaning the dataset so the samples were about the same length, cutting up individual notes if needed, but being sure to not cut in the middle of a word or sentence."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bMlw8h743m19"
   },
   "source": [
    "Now let's tokenize again with padding and truncation, and set up the tokenize function to make labels and input_ids the same. This is basically what [self-supervised fine-tuning is](https://neptune.ai/blog/self-supervised-learning)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "acINaViR3m19"
   },
   "outputs": [],
   "source": [
    "max_length = 512 # This was an appropriate max length for my dataset\n",
    "\n",
    "def generate_and_tokenize_prompt2(prompt):\n",
    "    result = tokenizer(\n",
    "        formatting_func(prompt),\n",
    "        truncation=True,\n",
    "        max_length=max_length,\n",
    "        padding=\"max_length\",\n",
    "    )\n",
    "    result[\"labels\"] = result[\"input_ids\"].copy()\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "518d4f0b89bf4d57bf00d4c6d6e59eb5"
     ]
    },
    "id": "lTk-aTog3m19",
    "outputId": "4fb637b4-77a2-47c6-de7b-4fb620663dd7"
   },
   "outputs": [],
   "source": [
    "tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt2)\n",
    "tokenized_val_dataset = eval_dataset.map(generate_and_tokenize_prompt2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TQL796OayRgh"
   },
   "source": [
    "Generally, each `input_ids` should be padded on the left with the `eos_token` (50256) and there should be an `eos_token` 50256 added to the end, and the prompt should start with a `bos_token` (?). However, I'm getting an error with Phi-2's tokenizer. GPU credits for whoever can resolve this!\n",
    "\n",
    "Hopefully should work just fine as-is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OKHhvxK83m19"
   },
   "outputs": [],
   "source": [
    "print(tokenized_train_dataset[1]['input_ids'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "I6LRa2Zm3m19"
   },
   "source": [
    "Now all the samples should be the same length, `max_length`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "id": "I55Yo3yy3m19",
    "outputId": "c87e344d-e0f3-4542-afcc-4e2025926d64"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "204\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1IAAAIjCAYAAAAJLyrXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABLkklEQVR4nO3deXwN9/7H8feRyL4JIgmRxL6rvSpVaqdUpbWUFqW62CrV+ulmaV2tolSVbqRaumjR0ktrCVpFLQ3lEqL2JLhViShJJPP7o4+c2yMRmUhysryej8c8buc735n5zMkI7/ud+R6LYRiGAAAAAAC5VsbeBQAAAABAcUOQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAJQ6k2ePFkWi6VQztWuXTu1a9fOur5582ZZLBZ99dVXhXL+IUOGKCQkpFDOlVfJyckaPny4/P39ZbFY9Mwzz9i7pHxX2D/3W1m3bp3uuOMOubi4yGKx6NKlS9n2i4yMlMVi0YkTJwq1voJg5lpCQkI0ZMiQAq8JQPFCkAJQomT+4yhzcXFxUWBgoLp06aK3335bly9fzpfzxMXFafLkyYqOjs6X4+WnolxbbvzrX/9SZGSknnrqKX3yySd65JFHbto3JCRE9913XyFWZ86yZcs0Z84ce5eRoz/++EN9+/aVq6ur5s+fr08++UTu7u72LitX/vOf/2jy5MklItgBKH4c7V0AABSEqVOnKjQ0VGlpaUpISNDmzZv1zDPPaPbs2fr222/VqFEja9+XXnpJ//d//2fq+HFxcZoyZYpCQkJ0xx135Hq/H374wdR58iKn2j744ANlZGQUeA23Y9OmTbrzzjs1adIke5dy25YtW6YDBw4U6VG1Xbt26fLly3r11VfVsWPHHPs+8sgj6t+/v5ydnQupupz95z//0ZQpU9SuXTvTI61F7VoAFD8EKQAlUrdu3dS8eXPr+sSJE7Vp0ybdd9996tWrlw4dOiRXV1dJkqOjoxwdC/bX4V9//SU3Nzc5OTkV6HlupWzZsnY9f26cP39e9erVs3cZpcb58+clST4+Prfs6+DgIAcHhwKuqHCUpGsBYB882geg1Lj33nv18ssv6+TJk/r000+t7dm9I7V+/XqFhYXJx8dHHh4eql27tl544QVJf7/f0qJFC0nS0KFDrY8RRkZGSvr7PagGDRpoz549atu2rdzc3Kz73viOVKb09HS98MIL8vf3l7u7u3r16qXTp0/b9LnZexr/POatasvuHakrV67o2WefVVBQkJydnVW7dm3NnDlThmHY9LNYLBo1apRWrVqlBg0ayNnZWfXr19e6deuy/8BvcP78eQ0bNkyVKlWSi4uLGjdurI8//ti6PfO9oePHj+u7776z1p4fj219+umnatasmVxdXeXr66v+/ftn+Xwzf27/+c9/1L59e7m5ualy5cqaMWNGluOdPHlSvXr1kru7u/z8/DRu3Dh9//33slgs2rx5s/V43333nU6ePGm9lhs/+4yMDE2bNk1VqlSRi4uLOnTooNjYWJs+R48eVXh4uPz9/eXi4qIqVaqof//+SkxMvOV1L1++3HrdFSpU0KBBg3T27Fmbax48eLAkqUWLFrJYLDm+C5Tde0WZj1f+9NNPatmypVxcXFStWjUtWbIk2323bt2qJ554QuXLl5eXl5ceffRR/fnnnzZ9LRaLJk+enOX8//wzEBkZqYceekiS1L59e+tnnPn530p212IYhl577TVVqVJFbm5uat++vQ4ePJhl37S0NE2ZMkU1a9aUi4uLypcvr7CwMK1fvz5X5wZQMjAiBaBUeeSRR/TCCy/ohx9+0OOPP55tn4MHD+q+++5To0aNNHXqVDk7Oys2Nlbbtm2TJNWtW1dTp07VK6+8ohEjRujuu++WJN11113WY/zxxx/q1q2b+vfvr0GDBqlSpUo51jVt2jRZLBZNmDBB58+f15w5c9SxY0dFR0dbR85yIze1/ZNhGOrVq5eioqI0bNgw3XHHHfr+++/13HPP6ezZs3rrrbds+v/0009asWKFnn76aXl6eurtt99WeHi4Tp06pfLly9+0rqtXr6pdu3aKjY3VqFGjFBoaquXLl2vIkCG6dOmSxo4dq7p16+qTTz7RuHHjVKVKFT377LOSpIoVK+b6+rMzbdo0vfzyy+rbt6+GDx+uCxcuaN68eWrbtq1+/fVXm5GYP//8U127dlWfPn3Ut29fffXVV5owYYIaNmyobt26Sfo7eN57772Kj4/X2LFj5e/vr2XLlikqKsrmvC+++KISExN15swZ6+fo4eFh0+f1119XmTJlNH78eCUmJmrGjBkaOHCgdu7cKUlKTU1Vly5dlJKSotGjR8vf319nz57VmjVrdOnSJXl7e9/0uiMjIzV06FC1aNFC06dP17lz5zR37lxt27bNet0vvviiateurffff9/6OGz16tVNf8axsbF68MEHNWzYMA0ePFiLFi3SkCFD1KxZM9WvX9+m76hRo+Tj46PJkycrJiZGCxYs0MmTJ61BOrfatm2rMWPG6O2339YLL7ygunXrSpL1f/PilVde0Wuvvabu3bure/fu2rt3rzp37qzU1FSbfpMnT9b06dM1fPhwtWzZUklJSdq9e7f27t2rTp065fn8AIoZAwBKkMWLFxuSjF27dt20j7e3t9GkSRPr+qRJk4x//jp86623DEnGhQsXbnqMXbt2GZKMxYsXZ9l2zz33GJKMhQsXZrvtnnvusa5HRUUZkozKlSsbSUlJ1vYvv/zSkGTMnTvX2hYcHGwMHjz4lsfMqbbBgwcbwcHB1vVVq1YZkozXXnvNpt+DDz5oWCwWIzY21tomyXBycrJp27dvnyHJmDdvXpZz/dOcOXMMScann35qbUtNTTVat25teHh42Fx7cHCw0aNHjxyPl9u+J06cMBwcHIxp06bZtP/222+Go6OjTXvmz23JkiXWtpSUFMPf398IDw+3ts2aNcuQZKxatcradvXqVaNOnTqGJCMqKsra3qNHD5vPO1Pmz71u3bpGSkqKtX3u3LmGJOO3334zDMMwfv31V0OSsXz58lt/GP+Qmppq+Pn5GQ0aNDCuXr1qbV+zZo0hyXjllVesbbn5M3Nj3+PHj1vbgoODDUnG1q1brW3nz583nJ2djWeffTbLvs2aNTNSU1Ot7TNmzDAkGd988421TZIxadKkLOe/8c/A8uXLs3zmuXXjtZw/f95wcnIyevToYWRkZFj7vfDCC4Ykm/M2btw41/cogJKLR/sAlDoeHh45zt6XOULxzTff5HliBmdnZw0dOjTX/R999FF5enpa1x988EEFBATo3//+d57On1v//ve/5eDgoDFjxti0P/vsszIMQ2vXrrVp79ixo82IRaNGjeTl5aXff//9lufx9/fXgAEDrG1ly5bVmDFjlJycrC1btuTD1WS1YsUKZWRkqG/fvvrvf/9rXfz9/VWzZs0so0geHh4aNGiQdd3JyUktW7a0ub5169apcuXK6tWrl7XNxcXlpiOcORk6dKjNe3OZI4iZ58sccfr+++/1119/5fq4u3fv1vnz5/X000/LxcXF2t6jRw/VqVNH3333nelac1KvXj1r7dLfo4i1a9fO9r4YMWKEzbt6Tz31lBwdHQv8Xr+VDRs2KDU1VaNHj7YZGctuohAfHx8dPHhQR48eLcQKARQ1BCkApU5ycrJNaLlRv3791KZNGw0fPlyVKlVS//799eWXX5oKVZUrVzY1sUTNmjVt1i0Wi2rUqFHg0zqfPHlSgYGBWT6PzMejTp48adNetWrVLMcoV65clndcsjtPzZo1VaaM7V87NztPfjl69KgMw1DNmjVVsWJFm+XQoUPWiRYyValSJcvjZTde38mTJ1W9evUs/WrUqGG6vhs/z3LlykmS9XyhoaGKiIjQhx9+qAoVKqhLly6aP3/+Ld+Pyvw8a9eunWVbnTp18v3zNnNf3Hive3h4KCAgwO5TmGd+JjfWV7FiRevPJdPUqVN16dIl1apVSw0bNtRzzz2n/fv3F1qtAIoGghSAUuXMmTNKTEzM8R+9rq6u2rp1qzZs2KBHHnlE+/fvV79+/dSpUyelp6fn6jxm3mvKrZu9P5LbmvLDzWY5M26YmKKoyMjIkMVi0bp167R+/fosy3vvvWfTv7CvLzfnmzVrlvbv368XXnhBV69e1ZgxY1S/fn2dOXOmQGrKi8L63ArzXs9J27ZtdezYMS1atEgNGjTQhx9+qKZNm+rDDz+0d2kAChFBCkCp8sknn0iSunTpkmO/MmXKqEOHDpo9e7b+85//aNq0adq0aZP1UTAzL8Xnxo2PCBmGodjYWJtZ3sqVK6dLly5l2ffG0QUztQUHBysuLi7Lo46HDx+2bs8PwcHBOnr0aJZRvfw+z42qV68uwzAUGhqqjh07ZlnuvPNO08cMDg7WsWPHsoSEG2fbk/LvPmnYsKFeeuklbd26VT/++KPOnj2rhQsX5lijJMXExGTZFhMTU2Cfd27ceK8nJycrPj7+lvd6amqq4uPjbdry889h5mdyY30XLlzIdmTN19dXQ4cO1WeffabTp0+rUaNG2c40CKDkIkgBKDU2bdqkV199VaGhoRo4cOBN+128eDFLW+YX26akpEiS3N3dJSnbYJMXS5YssQkzX331leLj460zxUl/h4IdO3bYzCC2Zs2aLNN4m6mte/fuSk9P1zvvvGPT/tZbb8lisdic/3Z0795dCQkJ+uKLL6xt169f17x58+Th4aF77rknX85zoz59+sjBwUFTpkzJEnwMw9Aff/xh+phdunTR2bNn9e2331rbrl27pg8++CBLX3d391xNU34zSUlJun79uk1bw4YNVaZMGeu9mJ3mzZvLz89PCxcutOm3du1aHTp0SD169MhzTbfr/fffV1pamnV9wYIFun79epZ7fevWrVn2u3FEKj//HHbs2FFly5bVvHnzbO6VOXPmZOl7433j4eGhGjVq5PgzAVDyMP05gBJp7dq1Onz4sK5fv65z585p06ZNWr9+vYKDg/Xtt9/avIB/o6lTp2rr1q3q0aOHgoODdf78eb377ruqUqWKwsLCJP39Dz0fHx8tXLhQnp6ecnd3V6tWrRQaGpqnen19fRUWFqahQ4fq3LlzmjNnjmrUqGEzgcHw4cP11VdfqWvXrurbt6+OHTumTz/9NMt01WZq69mzp9q3b68XX3xRJ06cUOPGjfXDDz/om2++0TPPPJOnqbCzM2LECL333nsaMmSI9uzZo5CQEH311Vfatm2b5syZk+M7a7cSGxur1157LUt7kyZN1KNHD7322muaOHGiTpw4od69e8vT01PHjx/XypUrNWLECI0fP97U+Z544gm98847GjBggMaOHauAgAAtXbrUek/9c5SkWbNm+uKLLxQREaEWLVrIw8NDPXv2zPW5Nm3apFGjRumhhx5SrVq1dP36dX3yySdycHBQeHj4TfcrW7as3njjDQ0dOlT33HOPBgwYYJ3+PCQkROPGjTN1zfkpNTVVHTp0UN++fRUTE6N3331XYWFhNpN3DB8+XE8++aTCw8PVqVMn7du3T99//70qVKhgc6w77rhDDg4OeuONN5SYmChnZ2fde++98vPzM11XxYoVNX78eE2fPl333Xefunfvrl9//VVr167Nct569eqpXbt2atasmXx9fbV792599dVXGjVqVN4+FADFk30mCwSAgpE5pXHm4uTkZPj7+xudOnUy5s6dazPNdqYbpz/fuHGjcf/99xuBgYGGk5OTERgYaAwYMMA4cuSIzX7ffPONUa9ePcPR0dFmuvF77rnHqF+/frb13Wz6888++8yYOHGi4efnZ7i6uho9evQwTp48mWX/WbNmGZUrVzacnZ2NNm3aGLt3785yzJxqu3H6c8MwjMuXLxvjxo0zAgMDjbJlyxo1a9Y03nzzTZspoA3j7ympR44cmaWmm03LfqNz584ZQ4cONSpUqGA4OTkZDRs2zHaKdrPTn//z5/3PZdiwYdZ+X3/9tREWFma4u7sb7u7uRp06dYyRI0caMTEx1j43+7ll95n9/vvvRo8ePQxXV1ejYsWKxrPPPmt8/fXXhiRjx44d1n7JycnGww8/bPj4+BiSrMfJ/LnfOK358ePHbX5ev//+u/HYY48Z1atXN1xcXAxfX1+jffv2xoYNG3L1+XzxxRdGkyZNDGdnZ8PX19cYOHCgcebMGZs++TH9eXY/rxvvy8x9t2zZYowYMcIoV66c4eHhYQwcOND4448/bPZNT083JkyYYFSoUMFwc3MzunTpYsTGxmZ7r33wwQdGtWrVDAcHB1NToWd3Lenp6caUKVOMgIAAw9XV1WjXrp1x4MCBLOd97bXXjJYtWxo+Pj6Gq6urUadOHWPatGk207oDKPkshlFE3xAGAKAYmTNnjsaNG6czZ86ocuXK9i6nyMn8guBdu3apefPm9i4HAG4b70gBAGDS1atXbdavXbum9957TzVr1iREAUApwTtSAACY1KdPH1WtWlV33HGHEhMT9emnn+rw4cNaunSpvUsr9ZKTk5WcnJxjn4oVK950ynYAyC2CFAAAJnXp0kUffvihli5dqvT0dNWrV0+ff/65+vXrZ+/SSr2ZM2dqypQpOfY5fvy4zXTrAJAXvCMFAABKjN9//12///57jn3CwsJynLkTAHKDIAUAAAAAJjHZBAAAAACYxDtSkjIyMhQXFydPT0+bL1IEAAAAULoYhqHLly8rMDBQZcrcfNyJICUpLi5OQUFB9i4DAAAAQBFx+vRpValS5abbCVKSPD09Jf39YXl5edm5GgAAAAD2kpSUpKCgIGtGuBmClGR9nM/Ly4sgBQAAAOCWr/ww2QQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgkqO9CwAAoKjo2dPeFfzP6tX2rgAAkBNGpAAAAADAJIIUAAAAAJhEkAIAAAAAk+wapKZPn64WLVrI09NTfn5+6t27t2JiYmz6XLt2TSNHjlT58uXl4eGh8PBwnTt3zqbPqVOn1KNHD7m5ucnPz0/PPfecrl+/XpiXAgAAAKAUsWuQ2rJli0aOHKkdO3Zo/fr1SktLU+fOnXXlyhVrn3Hjxmn16tVavny5tmzZori4OPXp08e6PT09XT169FBqaqp+/vlnffzxx4qMjNQrr7xij0sCAAAAUApYDMMw7F1EpgsXLsjPz09btmxR27ZtlZiYqIoVK2rZsmV68MEHJUmHDx9W3bp1tX37dt15551au3at7rvvPsXFxalSpUqSpIULF2rChAm6cOGCnJycbnnepKQkeXt7KzExUV5eXgV6jQCAootZ+wAAuc0GReodqcTEREmSr6+vJGnPnj1KS0tTx44drX3q1KmjqlWravv27ZKk7du3q2HDhtYQJUldunRRUlKSDh48mO15UlJSlJSUZLMAAAAAQG4VmSCVkZGhZ555Rm3atFGDBg0kSQkJCXJycpKPj49N30qVKikhIcHa558hKnN75rbsTJ8+Xd7e3tYlKCgon68GAAAAQElWZILUyJEjdeDAAX3++ecFfq6JEycqMTHRupw+fbrAzwkAAACg5HC0dwGSNGrUKK1Zs0Zbt25VlSpVrO3+/v5KTU3VpUuXbEalzp07J39/f2ufX375xeZ4mbP6Zfa5kbOzs5ydnfP5KgAAAACUFnYdkTIMQ6NGjdLKlSu1adMmhYaG2mxv1qyZypYtq40bN1rbYmJidOrUKbVu3VqS1Lp1a/322286f/68tc/69evl5eWlevXqFc6FAAAAAChV7DoiNXLkSC1btkzffPONPD09re80eXt7y9XVVd7e3ho2bJgiIiLk6+srLy8vjR49Wq1bt9add94pSercubPq1aunRx55RDNmzFBCQoJeeukljRw5klEnAAAAAAXCrkFqwYIFkqR27drZtC9evFhDhgyRJL311lsqU6aMwsPDlZKSoi5duujdd9+19nVwcNCaNWv01FNPqXXr1nJ3d9fgwYM1derUwroMAAAAAKVMkfoeKXvhe6QAABLfIwUAKKbfIwUAAAAAxQFBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASXYNUlu3blXPnj0VGBgoi8WiVatW2Wy3WCzZLm+++aa1T0hISJbtr7/+eiFfCQAAAIDSxK5B6sqVK2rcuLHmz5+f7fb4+HibZdGiRbJYLAoPD7fpN3XqVJt+o0ePLozyAQAAAJRSjvY8ebdu3dStW7ebbvf397dZ/+abb9S+fXtVq1bNpt3T0zNLXwAAAAAoKMXmHalz587pu+++07Bhw7Jse/3111W+fHk1adJEb775pq5fv57jsVJSUpSUlGSzAAAAAEBu2XVEyoyPP/5Ynp6e6tOnj037mDFj1LRpU/n6+urnn3/WxIkTFR8fr9mzZ9/0WNOnT9eUKVMKumQAAAAAJZTFMAzD3kVIf08ssXLlSvXu3Tvb7XXq1FGnTp00b968HI+zaNEiPfHEE0pOTpazs3O2fVJSUpSSkmJdT0pKUlBQkBITE+Xl5ZXnawAAFG89e9q7gv9ZvdreFQBA6ZSUlCRvb+9bZoNiMSL1448/KiYmRl988cUt+7Zq1UrXr1/XiRMnVLt27Wz7ODs73zRkAQAAAMCtFIt3pD766CM1a9ZMjRs3vmXf6OholSlTRn5+foVQGQAAAIDSyK4jUsnJyYqNjbWuHz9+XNHR0fL19VXVqlUl/T20tnz5cs2aNSvL/tu3b9fOnTvVvn17eXp6avv27Ro3bpwGDRqkcuXKFdp1AAAAAChd7Bqkdu/erfbt21vXIyIiJEmDBw9WZGSkJOnzzz+XYRgaMGBAlv2dnZ31+eefa/LkyUpJSVFoaKjGjRtnPQ4AAAAAFIQiM9mEPeX2hTIAQMnGZBMAgNxmg2LxjhQAAAAAFCUEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJtk1SG3dulU9e/ZUYGCgLBaLVq1aZbN9yJAhslgsNkvXrl1t+ly8eFEDBw6Ul5eXfHx8NGzYMCUnJxfiVQAAAAAobewapK5cuaLGjRtr/vz5N+3TtWtXxcfHW5fPPvvMZvvAgQN18OBBrV+/XmvWrNHWrVs1YsSIgi4dAAAAQCnmaM+Td+vWTd26dcuxj7Ozs/z9/bPddujQIa1bt067du1S8+bNJUnz5s1T9+7dNXPmTAUGBuZ7zQAAAABQ5N+R2rx5s/z8/FS7dm099dRT+uOPP6zbtm/fLh8fH2uIkqSOHTuqTJky2rlz502PmZKSoqSkJJsFAAAAAHKrSAeprl27asmSJdq4caPeeOMNbdmyRd26dVN6erokKSEhQX5+fjb7ODo6ytfXVwkJCTc97vTp0+Xt7W1dgoKCCvQ6AAAAAJQsdn2071b69+9v/e+GDRuqUaNGql69ujZv3qwOHTrk+bgTJ05URESEdT0pKYkwBQAAACDXivSI1I2qVaumChUqKDY2VpLk7++v8+fP2/S5fv26Ll68eNP3qqS/37vy8vKyWQAAAAAgt4pVkDpz5oz++OMPBQQESJJat26tS5cuac+ePdY+mzZtUkZGhlq1amWvMgEAAACUcHZ9tC85Odk6uiRJx48fV3R0tHx9feXr66spU6YoPDxc/v7+OnbsmJ5//nnVqFFDXbp0kSTVrVtXXbt21eOPP66FCxcqLS1No0aNUv/+/ZmxDwAAAECBseuI1O7du9WkSRM1adJEkhQREaEmTZrolVdekYODg/bv369evXqpVq1aGjZsmJo1a6Yff/xRzs7O1mMsXbpUderUUYcOHdS9e3eFhYXp/ffft9clAQAAACgFLIZhGPYuwt6SkpLk7e2txMRE3pcCgFKsZ097V/A/q1fbuwIAKJ1ymw2K1TtSAAAAAFAUEKQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhk1yC1detW9ezZU4GBgbJYLFq1apV1W1pamiZMmKCGDRvK3d1dgYGBevTRRxUXF2dzjJCQEFksFpvl9ddfL+QrAQAAAFCa2DVIXblyRY0bN9b8+fOzbPvrr7+0d+9evfzyy9q7d69WrFihmJgY9erVK0vfqVOnKj4+3rqMHj26MMoHAAAAUEo52vPk3bp1U7du3bLd5u3trfXr19u0vfPOO2rZsqVOnTqlqlWrWts9PT3l7+9foLUCAAAAQKZi9Y5UYmKiLBaLfHx8bNpff/11lS9fXk2aNNGbb76p69ev53iclJQUJSUl2SwAAAAAkFt2HZEy49q1a5owYYIGDBggLy8va/uYMWPUtGlT+fr66ueff9bEiRMVHx+v2bNn3/RY06dP15QpUwqjbAAAAAAlkMUwDMPeRUiSxWLRypUr1bt37yzb0tLSFB4erjNnzmjz5s02QepGixYt0hNPPKHk5GQ5Oztn2yclJUUpKSnW9aSkJAUFBSkxMTHHYwMASraePe1dwf+sXm3vCgCgdEpKSpK3t/cts0GRH5FKS0tT3759dfLkSW3atOmWQadVq1a6fv26Tpw4odq1a2fbx9nZ+aYhCwAAAABupUgHqcwQdfToUUVFRal8+fK33Cc6OlplypSRn59fIVQIAAAAoDSya5BKTk5WbGysdf348eOKjo6Wr6+vAgIC9OCDD2rv3r1as2aN0tPTlZCQIEny9fWVk5OTtm/frp07d6p9+/by9PTU9u3bNW7cOA0aNEjlypWz12UBAAAAKOHs+o7U5s2b1b59+yztgwcP1uTJkxUaGprtflFRUWrXrp327t2rp59+WocPH1ZKSopCQ0P1yCOPKCIiwtSje7l9DhIAULLxjhQAoEDfkfr9999VrVq1PBeXqV27dsopx90q4zVt2lQ7duy47ToAAAAAwIw8fY9UjRo11L59e3366ae6du1aftcEAAAAAEVanoLU3r171ahRI0VERMjf319PPPGEfvnll/yuDQAAAACKpDwFqTvuuENz585VXFycFi1apPj4eIWFhalBgwaaPXu2Lly4kN91AgAAAECRkacglcnR0VF9+vTR8uXL9cYbbyg2Nlbjx49XUFCQHn30UcXHx+dXnQAAAABQZNxWkNq9e7eefvppBQQEaPbs2Ro/fryOHTum9evXKy4uTvfff39+1QkAAAAARUaeZu2bPXu2Fi9erJiYGHXv3l1LlixR9+7dVabM37ksNDRUkZGRCgkJyc9aAQAAAKBIyFOQWrBggR577DENGTJEAQEB2fbx8/PTRx99dFvFAQAAAEBRlKcgdfTo0Vv2cXJy0uDBg/NyeAAAAAAo0vL0jtTixYu1fPnyLO3Lly/Xxx9/fNtFAQAAAEBRlqcgNX36dFWoUCFLu5+fn/71r3/ddlEAAAAAUJTlKUidOnVKoaGhWdqDg4N16tSp2y4KAAAAAIqyPAUpPz8/7d+/P0v7vn37VL58+dsuCgAAAACKsjwFqQEDBmjMmDGKiopSenq60tPTtWnTJo0dO1b9+/fP7xoBAAAAoEjJ06x9r776qk6cOKEOHTrI0fHvQ2RkZOjRRx/lHSkAAAAAJV6egpSTk5O++OILvfrqq9q3b59cXV3VsGFDBQcH53d9AAAAAFDk5ClIZapVq5Zq1aqVX7UAAAAAQLGQpyCVnp6uyMhIbdy4UefPn1dGRobN9k2bNuVLcQAAAABQFOUpSI0dO1aRkZHq0aOHGjRoIIvFkt91AQAAAECRlacg9fnnn+vLL79U9+7d87seAAAAACjy8jT9uZOTk2rUqJHftQAAAABAsZCnIPXss89q7ty5Mgwjv+sBAAAAgCIvT4/2/fTTT4qKitLatWtVv359lS1b1mb7ihUr8qU4AAAAACiK8hSkfHx89MADD+R3LQAAAABQLOQpSC1evDi/6wAAAACAYiNP70hJ0vXr17Vhwwa99957unz5siQpLi5OycnJ+VYcAAAAABRFeRqROnnypLp27apTp04pJSVFnTp1kqenp9544w2lpKRo4cKF+V0nAAAAABQZeRqRGjt2rJo3b64///xTrq6u1vYHHnhAGzduzLfiAAAAAKAoytOI1I8//qiff/5ZTk5ONu0hISE6e/ZsvhQGAAAAAEVVnkakMjIylJ6enqX9zJkz8vT0vO2iAAAAAKAoy1OQ6ty5s+bMmWNdt1gsSk5O1qRJk9S9e/f8qg0AAAAAiqQ8Pdo3a9YsdenSRfXq1dO1a9f08MMP6+jRo6pQoYI+++yz/K4RAAAAAIqUPAWpKlWqaN++ffr888+1f/9+JScna9iwYRo4cKDN5BMAAAAAUBLlKUhJkqOjowYNGpSftQAAAABAsZCnILVkyZIctz/66KN5KgYAAAAAioM8BamxY8farKelpemvv/6Sk5OT3NzcCFIAAAAASrQ8zdr3559/2izJycmKiYlRWFgYk00AAAAAKPHyFKSyU7NmTb3++utZRqsAAAAAoKTJtyAl/T0BRVxcXH4eEgAAAACKnDy9I/Xtt9/arBuGofj4eL3zzjtq06ZNvhQGAAAAAEVVnoJU7969bdYtFosqVqyoe++9V7NmzcqPugAAAACgyMpTkMrIyMjvOgAAAACg2MjXd6QAAAAAoDTI04hURERErvvOnj07L6cAAAAAgCIrT0Hq119/1a+//qq0tDTVrl1bknTkyBE5ODioadOm1n4WiyXH42zdulVvvvmm9uzZo/j4eK1cudLm/SvDMDRp0iR98MEHunTpktq0aaMFCxaoZs2a1j4XL17U6NGjtXr1apUpU0bh4eGaO3euPDw88nJpAAAAAHBLeXq0r2fPnmrbtq3OnDmjvXv3au/evTp9+rTat2+v++67T1FRUYqKitKmTZtyPM6VK1fUuHFjzZ8/P9vtM2bM0Ntvv62FCxdq586dcnd3V5cuXXTt2jVrn4EDB+rgwYNav3691qxZo61bt2rEiBF5uSwAAAAAyBWLYRiG2Z0qV66sH374QfXr17dpP3DggDp37pyn75KyWCw2I1KGYSgwMFDPPvusxo8fL0lKTExUpUqVFBkZqf79++vQoUOqV6+edu3apebNm0uS1q1bp+7du+vMmTMKDAzM1bmTkpLk7e2txMREeXl5ma4dAFAy9Oxp7wr+Z/Vqe1cAAKVTbrNBnkakkpKSdOHChSztFy5c0OXLl/NyyCyOHz+uhIQEdezY0drm7e2tVq1aafv27ZKk7du3y8fHxxqiJKljx44qU6aMdu7cedNjp6SkKCkpyWYBAAAAgNzKU5B64IEHNHToUK1YsUJnzpzRmTNn9PXXX2vYsGHq06dPvhSWkJAgSapUqZJNe6VKlazbEhIS5OfnZ7Pd0dFRvr6+1j7ZmT59ury9va1LUFBQvtQMAAAAoHTIU5BauHChunXrpocffljBwcEKDg7Www8/rK5du+rdd9/N7xrz3cSJE5WYmGhdTp8+be+SAAAAABQjeZq1z83NTe+++67efPNNHTt2TJJUvXp1ubu751th/v7+kqRz584pICDA2n7u3Dndcccd1j7nz5+32e/69eu6ePGidf/sODs7y9nZOd9qBQAAAFC63NYX8sbHxys+Pl41a9aUu7u78jBvxU2FhobK399fGzdutLYlJSVp586dat26tSSpdevWunTpkvbs2WPts2nTJmVkZKhVq1b5VgsAAAAA/FOeRqT++OMP9e3bV1FRUbJYLDp69KiqVaumYcOGqVy5cpo1a1aujpOcnKzY2Fjr+vHjxxUdHS1fX19VrVpVzzzzjF577TXVrFlToaGhevnllxUYGGid2a9u3brq2rWrHn/8cS1cuFBpaWkaNWqU+vfvn+sZ+wAAAADArDyNSI0bN05ly5bVqVOn5ObmZm3v16+f1q1bl+vj7N69W02aNFGTJk0kSREREWrSpIleeeUVSdLzzz+v0aNHa8SIEWrRooWSk5O1bt06ubi4WI+xdOlS1alTRx06dFD37t0VFham999/Py+XBQAAAAC5kqfvkfL399f333+vxo0by9PTU/v27VO1atX0+++/q1GjRkpOTi6IWgsM3yMFAJD4HikAQAF/j9SVK1dsRqIyXbx4kUkcAAAAAJR4eQpSd999t5YsWWJdt1gsysjI0IwZM9S+fft8Kw4AAAAAiqI8TTYxY8YMdejQQbt371Zqaqqef/55HTx4UBcvXtS2bdvyu0YAAAAAKFLyNCLVoEEDHTlyRGFhYbr//vt15coV9enTR7/++quqV6+e3zUCAAAAQJFiekQqLS1NXbt21cKFC/Xiiy8WRE0AAAAAUKSZHpEqW7as9u/fXxC1AAAAAECxkKdH+wYNGqSPPvoov2sBAAAAgGIhT5NNXL9+XYsWLdKGDRvUrFkzubu722yfPXt2vhQHAAAAAEWRqSD1+++/KyQkRAcOHFDTpk0lSUeOHLHpY7FY8q86AAAAACiCTAWpmjVrKj4+XlFRUZKkfv366e2331alSpUKpDgAAAAAKIpMvSNlGIbN+tq1a3XlypV8LQgAAAAAiro8TTaR6cZgBQAAAAClgakgZbFYsrwDxTtRAAAAAEobU+9IGYahIUOGyNnZWZJ07do1Pfnkk1lm7VuxYkX+VQgAAAAARYypIDV48GCb9UGDBuVrMQAAAABQHJgKUosXLy6oOgAAAACg2LitySYAAAAAoDQiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCpyAepkJAQWSyWLMvIkSMlSe3atcuy7cknn7Rz1QAAAABKMkd7F3Aru3btUnp6unX9wIED6tSpkx566CFr2+OPP66pU6da193c3Aq1RgAAAAClS5EPUhUrVrRZf/3111W9enXdc8891jY3Nzf5+/sXdmkAAAAASqki/2jfP6WmpurTTz/VY489JovFYm1funSpKlSooAYNGmjixIn666+/cjxOSkqKkpKSbBYAAAAAyK0iPyL1T6tWrdKlS5c0ZMgQa9vDDz+s4OBgBQYGav/+/ZowYYJiYmK0YsWKmx5n+vTpmjJlSiFUDAAAAKAkshiGYdi7iNzq0qWLnJyctHr16pv22bRpkzp06KDY2FhVr1492z4pKSlKSUmxriclJSkoKEiJiYny8vLK97oBAMVDz572ruB/cvirDgBQgJKSkuTt7X3LbFBsRqROnjypDRs25DjSJEmtWrWSpByDlLOzs5ydnfO9RgAAAAClQ7F5R2rx4sXy8/NTjx49cuwXHR0tSQoICCiEqgAAAACURsViRCojI0OLFy/W4MGD5ej4v5KPHTumZcuWqXv37ipfvrz279+vcePGqW3btmrUqJEdKwYAAABQkhWLILVhwwadOnVKjz32mE27k5OTNmzYoDlz5ujKlSsKCgpSeHi4XnrpJTtVCgAAAKA0KBZBqnPnzspuToygoCBt2bLFDhUBAAAAKM2KzTtSAAAAAFBUEKQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhUpIPU5MmTZbFYbJY6depYt1+7dk0jR45U+fLl5eHhofDwcJ07d86OFQMAAAAoDYp0kJKk+vXrKz4+3rr89NNP1m3jxo3T6tWrtXz5cm3ZskVxcXHq06ePHasFAAAAUBo42ruAW3F0dJS/v3+W9sTERH300UdatmyZ7r33XknS4sWLVbduXe3YsUN33nlnYZcKAAAAoJQo8iNSR48eVWBgoKpVq6aBAwfq1KlTkqQ9e/YoLS1NHTt2tPatU6eOqlatqu3bt+d4zJSUFCUlJdksAAAAAJBbRTpItWrVSpGRkVq3bp0WLFig48eP6+6779bly5eVkJAgJycn+fj42OxTqVIlJSQk5Hjc6dOny9vb27oEBQUV4FUAAAAAKGmK9KN93bp1s/53o0aN1KpVKwUHB+vLL7+Uq6trno87ceJERUREWNeTkpIIUwAAAAByrUiPSN3Ix8dHtWrVUmxsrPz9/ZWamqpLly7Z9Dl37ly271T9k7Ozs7y8vGwWAAAAAMitYhWkkpOTdezYMQUEBKhZs2YqW7asNm7caN0eExOjU6dOqXXr1nasEgAAAEBJV6Qf7Rs/frx69uyp4OBgxcXFadKkSXJwcNCAAQPk7e2tYcOGKSIiQr6+vvLy8tLo0aPVunVrZuwDAAAAUKCKdJA6c+aMBgwYoD/++EMVK1ZUWFiYduzYoYoVK0qS3nrrLZUpU0bh4eFKSUlRly5d9O6779q5agAAAAAlncUwDMPeRdhbUlKSvL29lZiYyPtSAFCK9exp7wr+Z/Vqe1cAAKVTbrNBsXpHCgAAAACKAoIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTinSQmj59ulq0aCFPT0/5+fmpd+/eiomJsenTrl07WSwWm+XJJ5+0U8UAAAAASoMiHaS2bNmikSNHaseOHVq/fr3S0tLUuXNnXblyxabf448/rvj4eOsyY8YMO1UMAAAAoDRwtHcBOVm3bp3NemRkpPz8/LRnzx61bdvW2u7m5iZ/f//CLg8AAABAKVWkR6RulJiYKEny9fW1aV+6dKkqVKigBg0aaOLEifrrr79yPE5KSoqSkpJsFgAAAADIrSI9IvVPGRkZeuaZZ9SmTRs1aNDA2v7www8rODhYgYGB2r9/vyZMmKCYmBitWLHipseaPn26pkyZUhhlAwAAACiBLIZhGPYuIjeeeuoprV27Vj/99JOqVKly036bNm1Shw4dFBsbq+rVq2fbJyUlRSkpKdb1pKQkBQUFKTExUV5eXvleOwCgeOjZ094V/M/q1fauAABKp6SkJHl7e98yGxSLEalRo0ZpzZo12rp1a44hSpJatWolSTkGKWdnZzk7O+d7nQAAAABKhyIdpAzD0OjRo7Vy5Upt3rxZoaGht9wnOjpakhQQEFDA1QEAAAAorYp0kBo5cqSWLVumb775Rp6enkpISJAkeXt7y9XVVceOHdOyZcvUvXt3lS9fXvv379e4cePUtm1bNWrUyM7VAwAAACipinSQWrBggaS/v3T3nxYvXqwhQ4bIyclJGzZs0Jw5c3TlyhUFBQUpPDxcL730kh2qBQAAAFBaFOkgdat5MIKCgrRly5ZCqgYAAAAA/lasvkcKAAAAAIoCghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJMIUgAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmESQAgAAAACTCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKQAAAAAwiSAFAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAAAAAmEaQAAAAAwCSCFAAAAACYRJACAAAAAJNKTJCaP3++QkJC5OLiolatWumXX36xd0kAAAAASqgSEaS++OILRUREaNKkSdq7d68aN26sLl266Pz58/YuDQAAAEAJVCKC1OzZs/X4449r6NChqlevnhYuXCg3NzctWrTI3qUBAAAAKIEc7V3A7UpNTdWePXs0ceJEa1uZMmXUsWNHbd++Pdt9UlJSlJKSYl1PTEyUJCUlJRVssQCAIi0tzd4V/A9/JQGAfWRmAsMwcuxX7IPUf//7X6Wnp6tSpUo27ZUqVdLhw4ez3Wf69OmaMmVKlvagoKACqREAALO8ve1dAQCUbpcvX5Z3Dr+Mi32QyouJEycqIiLCup6RkaGLFy+qfPnyslgsdqwMN5OUlKSgoCCdPn1aXl5e9i4HxQD3DMzinoFZ3DMwi3umeDAMQ5cvX1ZgYGCO/Yp9kKpQoYIcHBx07tw5m/Zz587J398/232cnZ3l7Oxs0+bj41NQJSIfeXl58YsHpnDPwCzuGZjFPQOzuGeKvpxGojIV+8kmnJyc1KxZM23cuNHalpGRoY0bN6p169Z2rAwAAABASVXsR6QkKSIiQoMHD1bz5s3VsmVLzZkzR1euXNHQoUPtXRoAAACAEqhEBKl+/frpwoULeuWVV5SQkKA77rhD69atyzIBBYovZ2dnTZo0KcsjmcDNcM/ALO4ZmMU9A7O4Z0oWi3Gref0AAAAAADaK/TtSAAAAAFDYCFIAAAAAYBJBCgAAAABMIkgBAAAAgEkEKRS4yZMny2Kx2Cx16tSxbn///ffVrl07eXl5yWKx6NKlS1mOMW3aNN11111yc3Mz9eXJhw4dUq9eveTt7S13d3e1aNFCp06dyoerQkGx1/2SnJysUaNGqUqVKnJ1dVW9evW0cOHCfLoqFKTbvWdOnDihYcOGKTQ0VK6urqpevbomTZqk1NTUHM977do1jRw5UuXLl5eHh4fCw8OzfDk8iiZ73DMXL17U6NGjVbt2bbm6uqpq1aoaM2aMEhMTC+oykY/s9Xsmk2EY6tatmywWi1atWpWPV4bbUSKmP0fRV79+fW3YsMG67uj4v1vvr7/+UteuXdW1a1dNnDgx2/1TU1P10EMPqXXr1vroo49ydc5jx44pLCxMw4YN05QpU+Tl5aWDBw/KxcXl9i4GBc4e90tERIQ2bdqkTz/9VCEhIfrhhx/09NNPKzAwUL169bq9C0KBu5175vDhw8rIyNB7772nGjVq6MCBA3r88cd15coVzZw586bnHDdunL777jstX75c3t7eGjVqlPr06aNt27bl78WhQBT2PRMXF6e4uDjNnDlT9erV08mTJ/Xkk08qLi5OX331Vf5fIPKdPX7PZJozZ44sFkv+XAjyjwEUsEmTJhmNGze+Zb+oqChDkvHnn3/etM/ixYsNb2/vXJ23X79+xqBBg3JXJIoMe90v9evXN6ZOnWrT1rRpU+PFF1/M1f6wn/y8ZzLNmDHDCA0Nven2S5cuGWXLljWWL19ubTt06JAhydi+fXtuyoYd2eOeyc6XX35pODk5GWlpaab2Q+Gz5z3z66+/GpUrVzbi4+MNScbKlStvXTAKBY/2oVAcPXpUgYGBqlatmgYOHFjgj9dlZGTou+++U61atdSlSxf5+fmpVatWDIcXE4V9v0jSXXfdpW+//VZnz56VYRiKiorSkSNH1Llz5wI/N25fft8ziYmJ8vX1ven2PXv2KC0tTR07drS21alTR1WrVtX27dtv69woHIV9z9xsHy8vL5uRDRRd9rhn/vrrLz388MOaP3++/P39b+t8yH8EKRS4Vq1aKTIyUuvWrdOCBQt0/Phx3X333bp8+XKBnfP8+fNKTk7W66+/rq5du+qHH37QAw88oD59+mjLli0Fdl7cPnvcL5I0b9481atXT1WqVJGTk5O6du2q+fPnq23btgV6Xty+/L5nYmNjNW/ePD3xxBM37ZOQkCAnJ6cs7+BVqlRJCQkJeTovCo897pkb/fe//9Wrr76qESNG5OmcKFz2umfGjRunu+66S/fff3+ezoMCZu8hMZQ+f/75p+Hl5WV8+OGHNu35+ajW2bNnDUnGgAEDbNp79uxp9O/fPy9lw04K434xDMN48803jVq1ahnffvutsW/fPmPevHmGh4eHsX79+tuoHvZwO/fMmTNnjOrVqxvDhg3L8RxLly41nJycsrS3aNHCeP755/NUN+ynMO6Zf0pMTDRatmxpdO3a1UhNTc1r2bCjwrhnvvnmG6NGjRrG5cuXrW3i0b4ihbFkFDofHx/VqlVLsbGxBXaOChUqyNHRUfXq1bNpr1u3rn766acCOy/yX2HcL1evXtULL7yglStXqkePHpKkRo0aKTo6WjNnzrR5fAtFX17vmbi4OLVv31533XWX3n///Rz7+vv7KzU1VZcuXbIZlTp37hyP3xRDhXHPZLp8+bK6du0qT09PrVy5UmXLls1LybCzwrhnNm3apGPHjmUZ+Q4PD9fdd9+tzZs3m6wa+Y1H+1DokpOTdezYMQUEBBTYOZycnNSiRQvFxMTYtB85ckTBwcEFdl7kv8K4X9LS0pSWlqYyZWx/JTo4OCgjI6PAzouCkZd75uzZs2rXrp2aNWumxYsXZ7kXbtSsWTOVLVtWGzdutLbFxMTo1KlTat26dZ5rh30Uxj0jSUlJSercubOcnJz07bffMotsMVYY98z//d//af/+/YqOjrYukvTWW29p8eLFt1M+8glBCgVu/Pjx2rJli06cOKGff/5ZDzzwgBwcHDRgwABJf79rEB0dbf1/dX777TdFR0fr4sWL1mOcOnVK0dHROnXqlNLT062/UJKTk6196tSpo5UrV1rXn3vuOX3xxRf64IMPFBsbq3feeUerV6/W008/XUhXjrywx/3i5eWle+65R88995w2b96s48ePKzIyUkuWLNEDDzxQiFePvLjdeybzHzdVq1bVzJkzdeHCBSUkJNi863T27FnVqVNHv/zyiyTJ29tbw4YNU0REhKKiorRnzx4NHTpUrVu31p133lnInwDMssc9kxmirly5oo8++khJSUnWfdLT0wv5E4BZ9rhn/P391aBBA5tFkqpWrarQ0NDCvHzcjL2fLUTJ169fPyMgIMBwcnIyKleubPTr18+IjY21bp80aZIhKcuyePFia5/Bgwdn2ycqKsra58Z9DMMwPvroI6NGjRqGi4uL0bhxY2PVqlUFfLW4Xfa6X+Lj440hQ4YYgYGBhouLi1G7dm1j1qxZRkZGRiFcNW7H7d4zixcvznb7P/+KPH78eJZ76OrVq8bTTz9tlCtXznBzczMeeOABIz4+vrAuG7fBHvdM5rsz2S3Hjx8vxKtHXtjr98yNxDtSRYrFMAzjttMYAAAAAJQiPNoHAAAAACYRpAAAAADAJIIUAAAAAJhEkAIAAAAAkwhSAAAAAGASQQoAAAAATCJIAQAAAIBJBCkAAAAAMIkgBQAo8oYMGaLevXvn+3ETEhLUqVMnubu7y8fHp1DPXRBCQkI0Z86cHPtYLBatWrWqUOoBgJKMIAUAkFQ0AsOJEydksVgUHR1dKOd76623FB8fr+joaB05ciTbPnPnzlVkZGSh1PNPkZGRNw13N7Nr1y6NGDGiYAoCANhwtHcBAADYy7Fjx9SsWTPVrFnzpn28vb0LsaLbU7FiRXuXAAClBiNSAIBcOXDggLp16yYPDw9VqlRJjzzyiP773/9at7dr105jxozR888/L19fX/n7+2vy5Mk2xzh8+LDCwsLk4uKievXqacOGDTaPmoWGhkqSmjRpIovFonbt2tnsP3PmTAUEBKh8+fIaOXKk0tLScqx5wYIFql69upycnFS7dm198skn1m0hISH6+uuvtWTJElksFg0ZMiTbY9w4Upeb67RYLFqwYIG6desmV1dXVatWTV999ZV1++bNm2WxWHTp0iVrW3R0tCwWi06cOKHNmzdr6NChSkxMlMVikcViyXKO7Nz4aN/Ro0fVtm1b6+e9fv16m/6pqakaNWqUAgIC5OLiouDgYE2fPv2W5wEAEKQAALlw6dIl3XvvvWrSpIl2796tdevW6dy5c+rbt69Nv48//lju7u7auXOnZsyYoalTp1r/8Z6enq7evXvLzc1NO3fu1Pvvv68XX3zRZv9ffvlFkrRhwwbFx8drxYoV1m1RUVE6duyYoqKi9PHHHysyMjLHR+5WrlypsWPH6tlnn9WBAwf0xBNPaOjQoYqKipL092NwXbt2Vd++fRUfH6+5c+fm+vPI6TozvfzyywoPD9e+ffs0cOBA9e/fX4cOHcrV8e+66y7NmTNHXl5eio+PV3x8vMaPH5/r+iQpIyNDffr0kZOTk3bu3KmFCxdqwoQJNn3efvttffvtt/ryyy8VExOjpUuXKiQkxNR5AKC04tE+AMAtvfPOO2rSpIn+9a9/WdsWLVqkoKAgHTlyRLVq1ZIkNWrUSJMmTZIk1axZU++88442btyoTp06af369Tp27Jg2b94sf39/SdK0adPUqVMn6zEzH00rX768tU+mcuXK6Z133pGDg4Pq1KmjHj16aOPGjXr88cezrXnmzJkaMmSInn76aUlSRESEduzYoZkzZ6p9+/aqWLGinJ2d5erqmuVct5LTdWZ66KGHNHz4cEnSq6++qvXr12vevHl69913b3l8JycneXt7y2KxmK4t04YNG3T48GF9//33CgwMlCT961//Urdu3ax9Tp06pZo1ayosLEwWi0XBwcF5OhcAlEaMSAEAbmnfvn2KioqSh4eHdalTp46kv98zytSoUSOb/QICAnT+/HlJUkxMjIKCgmyCQcuWLXNdQ/369eXg4JDtsbNz6NAhtWnTxqatTZs2uR4VyklO15mpdevWWdbz49y5dejQIQUFBVlDVHY1DRkyRNHR0apdu7bGjBmjH374odDqA4DijhEpAMAtJScnq2fPnnrjjTeybAsICLD+d9myZW22WSwWZWRk5EsNBXnswq6lTJm//39MwzCsbbd636sgNG3aVMePH9fatWu1YcMG9e3bVx07drR5nwsAkD1GpAAAt9S0aVMdPHhQISEhqlGjhs3i7u6eq2PUrl1bp0+f1rlz56xtu3btsunj5OQk6e/3qW5X3bp1tW3bNpu2bdu2qV69erd97NzYsWNHlvW6detK+t8jjPHx8dbtN0757uTkdFufQ926dXX69Gmbc9xYkyR5eXmpX79++uCDD/TFF1/o66+/1sWLF/N8XgAoLRiRAgBYJSYmZvkHfeYMeR988IEGDBhgna0uNjZWn3/+uT788EObR+5uplOnTqpevboGDx6sGTNm6PLly3rppZck/T2iI0l+fn5ydXXVunXrVKVKFbm4uOR5+vHnnntOffv2VZMmTdSxY0etXr1aK1as0IYNG/J0PLOWL1+u5s2bKywsTEuXLtUvv/yijz76SJJUo0YNBQUFafLkyZo2bZqOHDmiWbNm2ewfEhKi5ORkbdy4UY0bN5abm5vc3Nxyff6OHTuqVq1aGjx4sN58800lJSVlmdxj9uzZCggIUJMmTVSmTBktX75c/v7+pr+/CgBKI0akAABWmzdvVpMmTWyWKVOmKDAwUNu2bVN6ero6d+6shg0b6plnnpGPj4/1MbVbcXBw0KpVq5ScnKwWLVpo+PDh1n/Yu7i4SJIcHR319ttv67333lNgYKDuv//+PF9L7969NXfuXM2cOVP169fXe++9p8WLF2eZUr2gTJkyRZ9//rkaNWqkJUuW6LPPPrOOhpUtW1afffaZDh8+rEaNGumNN97Qa6+9ZrP/XXfdpSeffFL9+vVTxYoVNWPGDFPnL1OmjFauXKmrV6+qZcuWGj58uKZNm2bTx9PTUzNmzFDz5s3VokULnThxQv/+979z/TMFgNLMYvzzAW0AAArRtm3bFBYWptjYWFWvXt3e5eQbi8WilStX2nz/FACgZOHRPgBAoVm5cqU8PDxUs2ZNxcbGauzYsWrTpk2JClEAgNKBIAUAKDSXL1/WhAkTdOrUKVWoUEEdO3bM8m4Qsvfjjz/afAfUjZKTkwuxGgAAj/YBAFAMXL16VWfPnr3p9ho1ahRiNQAAghQAAAAAmMS0PAAAAABgEkEKAAAAAEwiSAEAAACASQQpAAAAADCJIAUAAAAAJhGkAAAAAMAkghQAAAAAmPT/sqs4+kF7nacAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 1000x600 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jP3R4enP3m19"
   },
   "source": [
    "### How does the base model do?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Vxbl4ACsyRgi"
   },
   "source": [
    "Optionally, you can check how Phi-2 does on one of your data samples. For example, if you have a dataset of users' biometric data to their health scores, you could test the following `eval_prompt`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "gOxnx-cAyRgi"
   },
   "outputs": [],
   "source": [
    "eval_prompt = \"\"\" Given the following biometric data, score the users' health, from 0-100.\n",
    "\n",
    "### Biometric Data:\n",
    "Temperature=98.2,\n",
    "Sex=F,\n",
    "Age=29,\n",
    "Height=69 inches,\n",
    "Weight=160 lbs,\n",
    "V02_Max=55,\n",
    "HRV=55\n",
    "\n",
    "### Health Score:\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KRhfq_Fa3m19"
   },
   "source": [
    "The `eval_prompt` I used was:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "pa6ux9ni3m19"
   },
   "outputs": [],
   "source": [
    "eval_prompt = \" The following is a note by Eevee the Dog: # \""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "NidIuFXMyRgi",
    "outputId": "b1794b11-9a22-4b0a-e871-7df039ab59fc"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The following is a note by Eevee the Dog: # \n",
      "I’m not sure if you know this, but I have been reading about how to be more environmentally friendly. It seems that there are many things we can do in our daily lives to help protect the planet and reduce waste. One of the most important things we can do is to recycle as much as possible. This means separating out materials like paper, plastic, glass, and metal so they can be reused instead of ending up in landfills or polluting our oceans. Another thing we can do is conserve energy by turning off lights when we leave a room, using energy-efficient appliances, and reducing our use of fossil fuels. We can also save water by taking shorter showers, fixing leaky faucets, and watering plants during cooler parts of the day. Finally, we can support companies and organizations that prioritize sustainability and environmental stewardship. By making these small changes in our own lives, we can all contribute to creating a healthier and more sustainable future for ourselves and generations to come.\n",
      "# \n",
      "By the way, did you know that dogs are actually very good at recycling? They love to play with old newspapers and cardboard boxes, and will often bury them in their backyard. And when it comes time to go potty, they always make sure\n"
     ]
    }
   ],
   "source": [
    "# Init an eval tokenizer so it doesn't add padding or eos token\n",
    "eval_tokenizer = AutoTokenizer.from_pretrained(\n",
    "    base_model_id,\n",
    "    add_bos_token=True,\n",
    "    use_fast=False, # needed for now, should be fixed soon\n",
    ")\n",
    "\n",
    "model_input = eval_tokenizer(eval_prompt, return_tensors=\"pt\").to(\"cuda\")\n",
    "\n",
    "model.eval()\n",
    "with torch.no_grad():\n",
    "    print(eval_tokenizer.decode(model.generate(**model_input, max_new_tokens=256, repetition_penalty=1.15)[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dCAWeCzZyRgi"
   },
   "source": [
    "Observe how the model does out of the box. This is clearly not my journal, lol."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AapDoyfAyRgi"
   },
   "source": [
    "### 5. Set Up LoRA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Mp2gMi1ZzGET"
   },
   "source": [
    "Now, to start our fine-tuning, we have to apply some preprocessing to the model to prepare it for training. Let's set up our LoRA layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "id": "gkIcwsSU01EB"
   },
   "outputs": [],
   "source": [
    "def print_trainable_parameters(model):\n",
    "    \"\"\"\n",
    "    Prints the number of trainable parameters in the model.\n",
    "    \"\"\"\n",
    "    trainable_params = 0\n",
    "    all_param = 0\n",
    "    for _, param in model.named_parameters():\n",
    "        all_param += param.numel()\n",
    "        if param.requires_grad:\n",
    "            trainable_params += param.numel()\n",
    "    print(\n",
    "        f\"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cUYEpEK-yRgj"
   },
   "source": [
    "Let's print the model to examine its layers, as we will apply QLoRA to some linear layers of the model. Those layers are `Wqkv`, `fc1`, `fc2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "XshGNsbxyRgj",
    "outputId": "c619b0e8-8516-4d4b-9abe-13eaa3f3b204",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PhiForCausalLM(\n",
      "  (transformer): PhiModel(\n",
      "    (embd): Embedding(\n",
      "      (wte): Embedding(51200, 2560)\n",
      "      (drop): Dropout(p=0.0, inplace=False)\n",
      "    )\n",
      "    (h): ModuleList(\n",
      "      (0-31): 32 x ParallelBlock(\n",
      "        (ln): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)\n",
      "        (resid_dropout): Dropout(p=0.1, inplace=False)\n",
      "        (mixer): MHA(\n",
      "          (rotary_emb): RotaryEmbedding()\n",
      "          (Wqkv): Linear8bitLt(in_features=2560, out_features=7680, bias=True)\n",
      "          (out_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)\n",
      "          (inner_attn): SelfAttention(\n",
      "            (drop): Dropout(p=0.0, inplace=False)\n",
      "          )\n",
      "          (inner_cross_attn): CrossAttention(\n",
      "            (drop): Dropout(p=0.0, inplace=False)\n",
      "          )\n",
      "        )\n",
      "        (mlp): MLP(\n",
      "          (fc1): Linear8bitLt(in_features=2560, out_features=10240, bias=True)\n",
      "          (fc2): Linear8bitLt(in_features=10240, out_features=2560, bias=True)\n",
      "          (act): NewGELUActivation()\n",
      "        )\n",
      "      )\n",
      "    )\n",
      "  )\n",
      "  (lm_head): CausalLMHead(\n",
      "    (ln): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)\n",
      "    (linear): Linear(in_features=2560, out_features=51200, bias=True)\n",
      "  )\n",
      "  (loss): CausalLMLoss(\n",
      "    (loss_fct): CrossEntropyLoss()\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "I6mTLuQJyRgj"
   },
   "source": [
    "Here we define the LoRA config.\n",
    "\n",
    "`r` is the rank of the low-rank matrix used in the adapters, which thus controls the number of parameters trained. A higher rank will allow for more expressivity, but there is a compute tradeoff.\n",
    "\n",
    "`alpha` is the scaling factor for the learned weights. The weight matrix is scaled by `alpha/r`, and thus a higher value for `alpha` assigns more weight to the LoRA activations.\n",
    "\n",
    "The values used in the QLoRA paper were `r=64` and `lora_alpha=16`, and these are said to generalize well, but we will use `r=32` and `lora_alpha=64` so that we have more emphasis on the new fine-tuned data while also reducing computational complexity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "Ybeyl20n3dYH",
    "outputId": "6a16c182-04d9-4812-ae81-502a8fe364d0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "trainable params: 36700160 || all params: 2816384000 || trainable%: 1.3030950324955688\n"
     ]
    }
   ],
   "source": [
    "from peft import LoraConfig, get_peft_model\n",
    "\n",
    "config = LoraConfig(\n",
    "    r=32,\n",
    "    lora_alpha=64,\n",
    "    target_modules=[\n",
    "        \"Wqkv\",\n",
    "        \"fc1\",\n",
    "        \"fc2\",\n",
    "    ],\n",
    "    bias=\"none\",\n",
    "    lora_dropout=0.05,  # Conventional\n",
    "    task_type=\"CAUSAL_LM\",\n",
    ")\n",
    "\n",
    "model = get_peft_model(model, config)\n",
    "print_trainable_parameters(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "X_FHi_VLyRgn"
   },
   "source": [
    "See how the model looks different now, with the LoRA adapters added:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "IaYMWak4yRgn"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PeftModelForCausalLM(\n",
      "  (base_model): LoraModel(\n",
      "    (model): PhiForCausalLM(\n",
      "      (transformer): PhiModel(\n",
      "        (embd): Embedding(\n",
      "          (wte): Embedding(51200, 2560)\n",
      "          (drop): Dropout(p=0.0, inplace=False)\n",
      "        )\n",
      "        (h): ModuleList(\n",
      "          (0-31): 32 x ParallelBlock(\n",
      "            (ln): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)\n",
      "            (resid_dropout): Dropout(p=0.1, inplace=False)\n",
      "            (mixer): MHA(\n",
      "              (rotary_emb): RotaryEmbedding()\n",
      "              (Wqkv): lora.Linear8bitLt(\n",
      "                (base_layer): Linear8bitLt(in_features=2560, out_features=7680, bias=True)\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=2560, out_features=32, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=32, out_features=7680, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "              )\n",
      "              (out_proj): Linear8bitLt(in_features=2560, out_features=2560, bias=True)\n",
      "              (inner_attn): SelfAttention(\n",
      "                (drop): Dropout(p=0.0, inplace=False)\n",
      "              )\n",
      "              (inner_cross_attn): CrossAttention(\n",
      "                (drop): Dropout(p=0.0, inplace=False)\n",
      "              )\n",
      "            )\n",
      "            (mlp): MLP(\n",
      "              (fc1): lora.Linear8bitLt(\n",
      "                (base_layer): Linear8bitLt(in_features=2560, out_features=10240, bias=True)\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=2560, out_features=32, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=32, out_features=10240, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "              )\n",
      "              (fc2): lora.Linear8bitLt(\n",
      "                (base_layer): Linear8bitLt(in_features=10240, out_features=2560, bias=True)\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=10240, out_features=32, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=32, out_features=2560, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "              )\n",
      "              (act): NewGELUActivation()\n",
      "            )\n",
      "          )\n",
      "        )\n",
      "      )\n",
      "      (lm_head): CausalLMHead(\n",
      "        (ln): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)\n",
      "        (linear): Linear(in_features=2560, out_features=51200, bias=True)\n",
      "      )\n",
      "      (loss): CausalLMLoss(\n",
      "        (loss_fct): CrossEntropyLoss()\n",
      "      )\n",
      "    )\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_0MOtwf3zdZp"
   },
   "source": [
    "### 6. Run Training!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fEe0uWYSyRgo"
   },
   "source": [
    "I didn't have a lot of training samples: only about 200 total train/validation. I used 500 training steps, and I was fine with overfitting in this case. I found that the end product worked well. It took about 20 minutes on the 1x A10G 24GB.\n",
    "\n",
    "Overfitting is when the validation loss goes up (bad) while the training loss goes down significantly, meaning the model is learning the training set really well, but is unable to generalize to new datapoints. In most cases, this is not desired, but since I am just playing around with a model to generate outputs like my journal entries, I was fine with a moderate amount of overfitting.\n",
    "\n",
    "With that said, a note on training: you can set the `max_steps` to be high initially, and examine at what step your model's performance starts to degrade. There is where you'll find a sweet spot for how many steps to perform. For example, say you start with 1000 steps, and find that at around 500 steps the model starts overfitting, as described above. Therefore, 500 steps would be your sweet spot, so you would use the `checkpoint-500` model repo in your output dir (`phi2-journal-finetune`) as your final model in step 6 below.\n",
    "\n",
    "If you're just doing something for fun like I did and are OK with overfitting, you can try different checkpoint versions with different degrees of overfitting.\n",
    "\n",
    "You can interrupt the process via Kernel -> Interrupt Kernel in the top nav bar once you realize you didn't need to train anymore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "id": "yxSbpKQSLY6B"
   },
   "outputs": [],
   "source": [
    "model = accelerator.prepare_model(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "id": "c_L1131GyRgo"
   },
   "outputs": [],
   "source": [
    "if torch.cuda.device_count() > 1: # If more than 1 GPU\n",
    "    model.is_parallelizable = True\n",
    "    model.model_parallel = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "id": "jq0nX33BmfaC"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <div>\n",
       "      \n",
       "      <progress value='500' max='500' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      [500/500 08:04, Epoch 6/7]\n",
       "    </div>\n",
       "    <table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       " <tr style=\"text-align: left;\">\n",
       "      <th>Step</th>\n",
       "      <th>Training Loss</th>\n",
       "      <th>Validation Loss</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>2.566300</td>\n",
       "      <td>2.521252</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>50</td>\n",
       "      <td>2.550000</td>\n",
       "      <td>2.524361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>75</td>\n",
       "      <td>2.361900</td>\n",
       "      <td>2.517028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>100</td>\n",
       "      <td>2.471400</td>\n",
       "      <td>2.523030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>125</td>\n",
       "      <td>2.467600</td>\n",
       "      <td>2.517590</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>150</td>\n",
       "      <td>2.334200</td>\n",
       "      <td>2.515742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>175</td>\n",
       "      <td>2.315900</td>\n",
       "      <td>2.516005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>200</td>\n",
       "      <td>2.177900</td>\n",
       "      <td>2.524362</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>225</td>\n",
       "      <td>2.312200</td>\n",
       "      <td>2.523537</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>250</td>\n",
       "      <td>2.247900</td>\n",
       "      <td>2.524427</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>275</td>\n",
       "      <td>2.265700</td>\n",
       "      <td>2.530054</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>300</td>\n",
       "      <td>2.218600</td>\n",
       "      <td>2.536566</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>325</td>\n",
       "      <td>2.333700</td>\n",
       "      <td>2.533651</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>350</td>\n",
       "      <td>2.231900</td>\n",
       "      <td>2.543580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>375</td>\n",
       "      <td>2.191800</td>\n",
       "      <td>2.547194</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>400</td>\n",
       "      <td>2.108500</td>\n",
       "      <td>2.552465</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>425</td>\n",
       "      <td>2.225200</td>\n",
       "      <td>2.551737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>450</td>\n",
       "      <td>2.181900</td>\n",
       "      <td>2.557251</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>475</td>\n",
       "      <td>2.237400</td>\n",
       "      <td>2.556605</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>500</td>\n",
       "      <td>2.090700</td>\n",
       "      <td>2.559073</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table><p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "TrainOutput(global_step=500, training_loss=2.294539993286133, metrics={'train_runtime': 485.375, 'train_samples_per_second': 2.06, 'train_steps_per_second': 1.03, 'total_flos': 8249278464000000.0, 'train_loss': 2.294539993286133, 'epoch': 6.1})"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import transformers\n",
    "from datetime import datetime\n",
    "\n",
    "project = \"journal-finetune\"\n",
    "base_model_name = \"phi2\"\n",
    "run_name = base_model_name + \"-\" + project\n",
    "output_dir = \"./\" + run_name\n",
    "\n",
    "trainer = transformers.Trainer(\n",
    "    model=model,\n",
    "    train_dataset=tokenized_train_dataset,\n",
    "    eval_dataset=tokenized_val_dataset,\n",
    "    args=transformers.TrainingArguments(\n",
    "        output_dir=output_dir,\n",
    "        warmup_steps=1,\n",
    "        per_device_train_batch_size=2,\n",
    "        gradient_accumulation_steps=1,\n",
    "        max_steps=500,\n",
    "        learning_rate=2.5e-5, # Want a small lr for finetuning\n",
    "        optim=\"paged_adamw_8bit\",\n",
    "        logging_steps=25,              # When to start reporting loss\n",
    "        logging_dir=\"./logs\",        # Directory for storing logs\n",
    "        save_strategy=\"steps\",       # Save the model checkpoint every logging step\n",
    "        save_steps=25,                # Save checkpoints every 50 steps\n",
    "        evaluation_strategy=\"steps\", # Evaluate the model every logging step\n",
    "        eval_steps=25,               # Evaluate and save checkpoints every 50 steps\n",
    "        do_eval=True,                # Perform evaluation at the end of training\n",
    "        report_to=\"wandb\",           # Comment this out if you don't want to use weights & baises\n",
    "        run_name=f\"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}\"          # Name of the W&B run (optional)\n",
    "    ),\n",
    "    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
    ")\n",
    "\n",
    "model.config.use_cache = False  # silence the warnings. Please re-enable for inference!\n",
    "trainer.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0D57XqcsyRgo"
   },
   "source": [
    "### 7. Drum Roll... Try the Trained Model!\n",
    "\n",
    "It's a good idea to kill the current process so that you don't run out of memory loading the base model again on top of the model we just trained. Go to `Kernel > Restart Kernel` or kill the process via the Terminal (`nvidia smi` > `kill [PID]`). \n",
    "\n",
    "By default, the PEFT library will only save the QLoRA adapters, so we need to first load the base model from the Huggingface Hub:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {
     "referenced_widgets": [
      "fb8230fb86884aa6be318e2d03a88af2"
     ]
    },
    "id": "SKSnF016yRgp",
    "outputId": "bce5209d-90da-4117-c6ac-cda9f3cb3422"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d34ea8fd074c47f8accb9cdd206339c0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "\n",
    "base_model_id = \"microsoft/phi-2\"\n",
    "base_model = AutoModelForCausalLM.from_pretrained(\n",
    "    base_model_id,  # Phi2, same as before\n",
    "    device_map=\"auto\",\n",
    "    trust_remote_code=True,\n",
    "    load_in_8bit=True,\n",
    "    torch_dtype=torch.float16,\n",
    ")\n",
    "\n",
    "eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)\n",
    "eval_tokenizer.pad_token = tokenizer.eos_token"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_BxOhAiqyRgp"
   },
   "source": [
    "Now load the QLoRA adapter from the appropriate checkpoint directory, i.e. the best performing model checkpoint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "id": "GwsiqhWuyRgp"
   },
   "outputs": [],
   "source": [
    "from peft import PeftModel\n",
    "\n",
    "ft_model = PeftModel.from_pretrained(base_model, \"phi2-journal-finetune/checkpoint-500\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lX39ibolyRgp"
   },
   "source": [
    "and run your inference!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UUehsaVNyRgp"
   },
   "source": [
    "Let's try the same `eval_prompt` and thus `model_input` as above, and see if the new finetuned model performs better. I like playing with the repetition penalty (just little tweaks of .01-.05 at a time). THIS IS SO FUN. I'm obsessed wth this AI version of myself."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "id": "lMkVNEUvyRgp",
    "outputId": "7d49d409-5dbe-4306-c1a4-9d87e3073397"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The following is a note by Eevee the Dog: # Today I \n",
      "\n",
      "Today I am grateful for my health. I’m healthy and happy, and I have so much to be thankful for. I feel blessed every day. I love my life and all of its imperfections. I know that tomorrow will bring more challenges but I also know that I can overcome them with grace and patience. I am at peace today. I am in touch with myself and with God. I am surrounded by people who love me and support me. I am exactly where I\n"
     ]
    }
   ],
   "source": [
    "eval_prompt = \" The following is a note by Eevee the Dog: # Today I \"\n",
    "model_input = eval_tokenizer(eval_prompt, return_tensors=\"pt\").to(\"cuda\")\n",
    "\n",
    "ft_model.eval()\n",
    "with torch.no_grad():\n",
    "    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=100, repetition_penalty=1.11)[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VCJnpZoayRgq"
   },
   "source": [
    "### Sweet... it worked! The fine-tuned model now prints out journal entries in my style!\n",
    "\n",
    "How funny to see it write like me as an angsty teenager, and honestly adult. I am obsessed. It knows who my friends are and talks about them, and covers the same topics I usually cover. It's really cool.\n",
    "\n",
    "I hope you enjoyed this tutorial on fine-tuning Microsoft's Phi-2 on your own data. If you have any questions, feel free to reach out to me on [X](https://x.com/harperscarroll) or [Discord](https://discord.gg/RN2a436M73).\n",
    "\n",
    "🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙 🤙"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "gpuClass": "standard",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
